Statannot: box_pairs contains an invalid box pair?

Statannot: box_pairs contains an invalid box pair? - testing

I'm trying to perform a Wilcoxon tests between pairs of boxplots in the figure below using "statannot". I am using the following piece of code:
df = pd.read_pickle('metrics_total.pkl')
plt.figure(1)
ax = sns.boxplot(x="Location", y="value", hue="Input approach", data=df, showmeans=True, meanprops = {"marker": "x", "markerfacecolor": "white", "markeredgecolor": "white"})
ax.set_ylim([0, 1])
ax.set_ylabel('Metric value', fontsize=15)
ax.set_xlabel('Metric', fontsize=15)
add_stat_annotation(ax, data=df, x="Location", y="value", hue="Input approach",
box_pairs=[('DSC', 'Model 1'), ('DSC', 'Model 2')],
test='Wilcoxon', text_format='star',
loc='outside', verbose=2)
However, I am getting the follwing error:
ValueError: box_pairs contains an invalid box pair.
I am following an example on the document page of statannot so I don't know what I am doing wrong?

You are missing some parentheses when defining the box pairs, the following should work:
box_pairs=[(('DSC', 'Model 1'), ('DSC', 'Model 2'))],

Related

Debugging interactive matplotlib figures in Jupyter notebooks

The example below should highlight the point that you click on, and change the title of the graph to show the label associated with that point.
If I run this Python script as a script, when I click on a point I will get an error " line 15, in onpick
TypeError: only integer scalar arrays can be converted to a scalar index", which is expected. event.ind is a list, and I need to change that to ind = event.ind[0] to be correct here.
However, when I run this in a Jupyter notebook, the figure appears, but the error is silently ignored, so it just appears that the code does not work. Is there a way to get Jupyter to show me that an error has occurred?
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0, 1, 2, 3, 4, 5]
labels = ['a', 'b', 'c', 'd', 'e', 'f']
ax.plot(x, 'bo', picker=5)
# this is the transparent marker for the selected data point
marker, = ax.plot([0], [0], 'yo', visible=False, alpha=0.8, ms=15)
def onpick(event):
ind = event.ind
ax.set_title('Data point {0} is labeled "{1}"'.format(ind, labels[ind]))
marker.set_visible(True)
marker.set_xdata(x[ind])
marker.set_ydata(x[ind])
ax.figure.canvas.draw() # this line is critical to change the linewidth
fig.canvas.mpl_connect('pick_event', onpick)
plt.show()

Matplotlib: Boxplot and bar chart shifted when overlaid using twinx

When I create a barplot and overlay a bar chart using twin x then the boxes appear shifted by one to the right compared to the bars.
This problem has been identified before (Python pandas plotting shift x-axis if twinx two y-axes), but the solution no longer seems to work. (I am using Matplotlib 3.1.0)
li_str = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']
df = pd.DataFrame([[i]+j[k] for i,j in {li_str[i]:np.random.randn(j,2).tolist() for i,j in \
enumerate(np.random.randint(5, 15, len(li_str)))}.items() for k in range(len(j))]
, columns=['A', 'B', 'C'])
fig, ax = plt.subplots(figsize=(16,6))
ax2 = ax.twinx()
df_gb = df.groupby('A').count()
p1 = df.boxplot(ax=ax, column='B', by='A', sym='')
p2 = df_gb['B'].plot(ax=ax2, kind='bar', figsize=(16,6)
, colormap='Set2', alpha=0.3, secondary_y=True)
plt.ylim([0, 20])
The output shows the boxes shifted to the right by one compared to the bars. The respondent of the previous post rightly pointed out that the tick-locations of the bars are zero-based and the tick-locations of the boxes are one-based, which is causing the shift. However, the plt.bar() method the respondent uses to fix it, now throws an error, since an x-parameter has been made mandatory. If the x-parameter is provided it still throws an error because there is no parameter 'left' anymore.
df.boxplot(column='B', by='A')
plt.twinx()
plt.bar(left=plt.xticks()[0], height=df.groupby('A').count()['B'],
align='center', alpha=0.3)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-186-e257461650c1> in <module>
26 plt.twinx()
27 plt.bar(left=plt.xticks()[0], height=df.groupby('A').count()['B'],
---> 28 align='center', alpha=0.3)
TypeError: bar() missing 1 required positional argument: 'x'
In addition, I would much prefer a fix using the object-oriented approach with reference to the axes, because I want to place the chart into an interactive ipywidget.
Here is the ideal chart:
Many thanks.

You can use the following trick: Provide the x-values for placing your bars starting at x=1. To do so, use range(1, len(df_gb['B'])+1) as the x-values.
fig, ax = plt.subplots(figsize=(8, 4))
ax2 = ax.twinx()
df_gb = df.groupby('A').count()
df.boxplot(column='B', by='A', ax=ax)
ax2.bar(range(1, len(df_gb['B'])+1), height=df_gb['B'],align='center', alpha=0.3)

Edit array of axis output from pandas plot method

I'm plotting from a pandas dataframe with subplots and as a result I get a np.array with a number of axis.
array([<matplotlib.axes._subplots.AxesSubplot object at blablabla>,
<matplotlib.axes._subplots.AxesSubplot object at blablabla>,
<matplotlib.axes._subplots.AxesSubplot object at blablabla>])
I want to grab this output to edit the title, x label and save it as pdf. If it was only one axis I would first grab the output of the .plot in a variable, say ax and then set the title and get the figure with fig = ax.get_figure() to save it the way I want. How can I do the same here?

Let's use ax = infront of df.plot to get a list of axes. Then you can use list slicing to access each axes object and set_title, etc.. as below:
df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))
df = df.cumsum()
ax = df.plot(subplots=True)
ax[0].set_title('Series A')
ax[1].set_title('Series B')
ax[2].set_title('Series C')
ax[3].set_title('Series D')
fig = ax[0].get_figure()
fig.tight_layout()

matplotlib legend shows "Container object of 10 artists" instead of label

The code below produces the plot and legend correctly, however the legend does not show the specified label text, but "Container object of 10 artists" instead.
ax = plt.subplot(212)
plt.tight_layout()
l1= plt.bar(X+0.25, Y1, 0.45, align='center', color='r', label='A', edgecolor='black', hatch="/")
l2= plt.bar(X, Y2, 0.45, align='center', color='b', label='N',hatch='o', fill=False)
ax.autoscale(tight=True)
plt.xticks(X, X_values)
ymax1 = max(Y1) + 1
ymax2 = max(Y2) + 1
ymax = max(ymax1,ymax2)+1
plt.ylim(0, ymax)
plt.grid(True)
plt.legend([l1,l2], loc='upper right', prop={'size':20})
The output is shown below:
How can I correctly display the labels for each bar (as specified in the plt.bar() function) on the legend?

The problem stems from mixing two approaches to using plt.legend(). You have two options:
Manually specify the labels for the legend
Use ax.get_legend_handles_labels() to fill them in with the label parameters you passed to plt.bar()
To manually specify the labels, pass them as the second argument to your call to plt.legend() as follows:
plt.legend([l1,l2], ["A", "N"], loc='upper right', prop={'size':20})
If instead you want to automatically populate the legend you can use the following to find legend-able objects in the plot and their labels:
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper right', prop={'size':20})

Keywords arguments in matplotlib radviz

I am trying to understand the keyword arguments that can be used in matplotlib radviz. I am using the well-known iris dataset, and the simple code below:
import pandas as pd
plt.xkcd()
iris = pd.read_csv("iris.csv")
pd.tools.plotting.radviz(iris, "name")
Generating the following chart:
How can I setup the dimensions (x, y) and the title of the chart? How can I specify the placement of the legend? What other arguments (if any) can be used with radviz?
Thank you very much for your help.

all the pandas plotting tools take an ax argument, you can make the axis and pass to the plotting function:
fig = plt.figure( )
ax = fig.add_axes( [.05, .05, .9, .9], title='whatever title' )
pd.tools.plotting.radviz( iris, 'name', ax=ax )
then if you need to change the legend, you may do:
ax.legend( loc='center right', fontsize='medium' )
or change the title:
ax.set_title( 'new title' )
alternatively, i believe the plotting tools return the axis after plotting, so you may do
ax = pd.tools.plotting.radviz( iris, 'name')
and check dir( ax ) for some of the functionality available.
with plt.xkcd( ):
ax = pd.tools.plotting.radviz(df, 'Name')
ax.legend( loc='center left', bbox_to_anchor=(0, 1),
fontsize='medium', fancybox=True, ncol=3 )
ax.set_xlim( -1.6, 1.6, emit=True, auto=False )
ax.set_title( 'iris - radviz', loc='right' )

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Statannot: box_pairs contains an invalid box pair? - testing

You are missing some parentheses when defining the box pairs, the following should work: box_pairs=[(('DSC', 'Model 1'), ('DSC', 'Model 2'))],

Related

Debugging interactive matplotlib figures in Jupyter notebooks

Matplotlib: Boxplot and bar chart shifted when overlaid using twinx

Edit array of axis output from pandas plot method

matplotlib legend shows "Container object of 10 artists" instead of label

Keywords arguments in matplotlib radviz

Categories

Resources