"panel barchart" in matplotlib - matplotlib

I would like to produce a figure like this one using matplotlib:
(source: peltiertech.com)
My data are in a pandas DataFrame, and I've gotten as far as a regular stacked barchart, but I can't figure out how to do the part where each category is given its own y-axis baseline.
Ideally I would like the vertical scale to be exactly the same for all the subplots and move the panel labels off to the side so there can be no gaps between the rows.

I haven't exactly replicated what you want but this should get you pretty close.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
#create dummy data
cols = ['col'+str(i) for i in range(10)]
ind = ['ind'+str(i) for i in range(10)]
df = pd.DataFrame(np.random.normal(loc=10, scale=5, size=(10, 10)), index=ind, columns=cols)
#create plot
sns.set_style("whitegrid")
axs = df.plot(kind='bar', subplots=True, sharey=True,
figsize=(6, 5), legend=False, yticks=[],
grid=False, ylim=(0, 14), edgecolor='none',
fontsize=14, color=[sns.xkcd_rgb["brownish red"]])
plt.text(-1, 100, "The y-axis label", fontsize=14, rotation=90) # add a y-label with custom positioning
sns.despine(left=True) # get rid of the axes
for ax in axs: # set the names beside the axes
ax.lines[0].set_visible(False) # remove ugly dashed line
ax.set_title('')
sername = ax.get_legend_handles_labels()[1][0]
ax.text(9.8, 5, sername, fontsize=14)
plt.suptitle("My panel chart", fontsize=18)

Related

Trying to place text in mpl just above the first yticklabel

I am having diffculties to move the text "Rank" exactly one line above the first label and by not using guesswork as I have different chart types with variable sizes, widths and also paddings between the labels and bars.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pylab import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1,30)))
df.plot.barh(width=0.8,ax=ax,legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
ax.text(-5,30,"Rank")
plt.show()
Using transData.transform didn't get me any further. The problem seems to be that ax.text() with the position params of (0,0) aligns with the start of the bars and not the yticklabels which I need, so getting the exact position of yticklabels relative to the axis would be helpful.
The following approach creates an offset_copy transform, using "axes coordinates". The top left corner of the main plot is at position 0, 1 in axes coordinates. The ticks have a "pad" (between label and tick mark) and a "padding" (length of the tick mark), both measured in "points".
The text can be right aligned, just as the ticks. With "bottom" as vertical alignment, it will be just above the main plot. If that distance is too low, you could try ax.text(0, 1.01, ...) to have it a bit higher.
import matplotlib.pyplot as plt
from matplotlib.transforms import offset_copy
import pandas as pd
import numpy as np
from matplotlib import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1, 30)))
df.plot.barh(width=0.8, ax=ax, legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
tick = ax.yaxis.get_major_ticks()[-1] # get information of one of the ticks
padding = tick.get_pad() + tick.get_tick_padding()
trans_offset = offset_copy(ax.transAxes, fig=fig, x=-padding, y=0, units='points')
ax.text(0, 1, "Rank", ha='right', va='bottom', transform=trans_offset)
# optionally also use tick.label.get_fontproperties()
plt.tight_layout()
plt.show()
I've answered my own question while Johan was had posted his one - which is pretty good and what I wanted. However, I post mine anyways as it uses an entirely different approach. Here I add a "ghost" row into the dataframe and label it appropriately which solves the problem:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pylab import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1,30)),columns=["val"])
#add a temporary header
new_row = pd.DataFrame({"val":0}, index=[0])
df = pd.concat([df[:],new_row]).reset_index(drop = True)
df.plot.barh(width=0.8,ax=ax,legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
# Set the top label to "Rank"
yticklabels = [t for t in ax.get_yticklabels()]
yticklabels[-1]="Rank"
# Left align all labels
[t.set_ha("left") for t in ax.get_yticklabels()]
ax.set_yticklabels(yticklabels)
# delete the top bar effectively by setting it's height to 0
ax.patches[-1].set_height(0)
plt.show()
Perhaps the advantage is that it is always a constant distance above the top label, but with the disadvantage that this is a bit "patchy" in the most literal sense to transform your dataframe for this task.

Pandas plot of a stacked and grouped bar chart

I have a CSV-file with a couple of data:
# Comment header
#
MainCategory,SubCategory,DurationM,DurationH,Number
MainCat1,Sub1.1,598,9.97,105
MainCat1,Sub1.2,11,0.18,4
MainCat1,Sub1.3,17,0.28,5
MainCat1,Sub1.4,16,0.27,2
MainCat2,Sub2.1,14161,236.02,102
MainCat2,Sub2.2,834,13.90,17
MainCat3,Sub3.1,4325,72.08,472
MainCat3,Sub3.2,7,0.12,2
MainCat4,Sub4.1,614,10.23,60
MainCat5,Sub5.1,6362,106.03,142
MainCat5,Sub5.2,141,2.35,6
Misc,Misc.1,3033,50.55,53
MainCat4,Sub4.2,339,5.65,4
MainCat4,Sub4.3,925,15.42,11
Misc,Misc.2,2641,44.02,28
MainCat6,Sub6.1,370,6.17,4
MainCat7,Sub7.1,9601,160.02,10
MainCat4,Sub4.4,75,1.25,2
MainCat8,Sub8.1,148,2.47,4
MainCat8,Sub8.2,680,11.35,7
MainCat9,Sub9.1,3997,66.62,1
MainCat8,Sub8.3,105,1.75,2
MainCat4,Sub4.5,997,16.62,1
MainCat10,Sub10.1,12,0.20,3
MainCat4,Sub4.6,10,0.17,1
MainCat10,Sub10.2,13,0.22,1
MainCat4,Sub4.7,561,9.35,4
MainCat10,Sub10.3,1043,17.38,47
What I would like to achieve is a stacked bar plot where
the X-axis values/labels are given by the values/groups given by MainCategory
on the left Y-axis, the DurationH is used
on the right Y-axis the Number is used
DurationH and Number are plotted as bars per MainCategory side-by-side
In each of the bars, the SubCategory is used for stacking
Something like this:
The following code produces stacked plots, but a sequence of them:
import pandas as pd
from pandas import DataFrame
from matplotlib import pyplot as plt
import seaborn as sns
data = pd.read_csv('failureEventStatistic_total_Top10.csv', sep=',', header=2)
data = data.rename(columns={'DurationM':'Duration [min]', 'DurationH':'Duration [h]'})
data.groupby('MainCategorie')[['Duration [h]', 'Number']].plot.bar()
I tried to use unstack(), but this produces an error
You can get the plot data from a crosstab and then make a right aligned and a left aligned bar plot on the same axes:
ax = pd.crosstab(df.MainCategory, df.SubCategory.str.partition('.')[2], df.DurationH, aggfunc=sum).plot.bar(
stacked=True, width=-0.4, align='edge', ylabel='DurationH', ec='w', color=[(0,1,0,x) for x in np.linspace(1, 0.1, 7)], legend=False)
h_durationh, _ = ax.get_legend_handles_labels()
ax = pd.crosstab(df.MainCategory, df.SubCategory.str.partition('.')[2], df.Number, aggfunc=sum).plot.bar(
stacked=True, width=0.4, align='edge', secondary_y=True, ec='w', color=[(0,0,1,x) for x in np.linspace(1, 0.1, 7)], legend=False, ax=ax)
h_number, _ = ax.get_legend_handles_labels()
ax.set_ylabel('Number')
ax.set_xlim(left=ax.get_xlim()[0] - 0.5)
ax.legend([h_durationh[0], h_number[0]], ['DurationH', 'Number'])

Align multi-line ticks in Seaborn plot

I have the following heatmap:
I've broken up the category names by each capital letter and then capitalised them. This achieves a centering effect across the labels on my x-axis by default which I'd like to replicate across my y-axis.
yticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.index]
xticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.columns]
fig, ax = plt.subplots(figsize=(20,15))
sns.heatmap(corr, ax=ax, annot=True, fmt="d",
cmap="Blues", annot_kws=annot_kws,
mask=mask, vmin=0, vmax=5000,
cbar_kws={"shrink": .8}, square=True,
linewidths=5)
for p in ax.texts:
myTrans = p.get_transform()
offset = mpl.transforms.ScaledTranslation(-12, 5, mpl.transforms.IdentityTransform())
p.set_transform(myTrans + offset)
plt.yticks(plt.yticks()[0], labels=yticks, rotation=0, linespacing=0.4)
plt.xticks(plt.xticks()[0], labels=xticks, rotation=0, linespacing=0.4)
where corr represents a pre-defined pandas dataframe.
I couldn't seem to find an align parameter for setting the ticks and was wondering if and how this centering could be achieved in seaborn/matplotlib?
I've adapted the seaborn correlation plot example below.
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white")
# Generate a large random dataset
rs = np.random.RandomState(33)
d = pd.DataFrame(data=rs.normal(size=(100, 7)),
columns=['Donald\nDuck','Mickey\nMouse','Han\nSolo',
'Luke\nSkywalker','Yoda','Santa\nClause','Ronald\nMcDonald'])
# Compute the correlation matrix
corr = d.corr()
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
for i in ax.get_yticklabels():
i.set_ha('right')
i.set_rotation(0)
for i in ax.get_xticklabels():
i.set_ha('center')
Note the two for sequences above. These get the label and then set the horizontal alignment (You can also change the vertical alignment (set_va()).
The code above produces this:

Arrange two plots horizontally

As an exercise, I'm reproducing a plot from The Economist with matplotlib
So far, I can generate a random data and produce two plots independently. I'm struggling now with putting them next to each other horizontally.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
df1 = pd.DataFrame({"broadcast": np.random.randint(110, 150,size=8),
"cable": np.random.randint(100, 250, size=8),
"streaming" : np.random.randint(10, 50, size=8)},
index=pd.Series(np.arange(2009,2017),name='year'))
df1.plot.bar(stacked=True)
df2 = pd.DataFrame({'usage': np.sort(np.random.randint(1,50,size=7)),
'avg_hour': np.sort(np.random.randint(0,3, size=7) + np.random.ranf(size=7))},
index=pd.Series(np.arange(2009,2016),name='year'))
plt.figure()
fig, ax1 = plt.subplots()
ax1.plot(df2['avg_hour'])
ax2 = ax1.twinx()
ax2.bar(left=range(2009,2016),height=df2['usage'])
plt.show()
You should try using subplots. First you create a figure by plt.figure(). Then add one subplot(121) where 1 is number of rows, 2 is number of columns and last 1 is your first plot. Then you plot the first dataframe, note that you should use the created axis ax1. Then add the second subplot(122) and repeat for the second dataframe. I changed your axis ax2 to ax3 since now you have three axis on one figure. The code below produces what I believe you are looking for. You can then work on aesthetics of each plot separately.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
df1 = pd.DataFrame({"broadcast": np.random.randint(110, 150,size=8),
"cable": np.random.randint(100, 250, size=8),
"streaming" : np.random.randint(10, 50, size=8)},
index=pd.Series(np.arange(2009,2017),name='year'))
ax1 = fig.add_subplot(121)
df1.plot.bar(stacked=True,ax=ax1)
df2 = pd.DataFrame({'usage': np.sort(np.random.randint(1,50,size=7)),
'avg_hour': np.sort(np.random.randint(0,3, size=7) + np.random.ranf(size=7))},
index=pd.Series(np.arange(2009,2016),name='year'))
ax2 = fig.add_subplot(122)
ax2.plot(df2['avg_hour'])
ax3 = ax2.twinx()
ax3.bar(left=range(2009,2016),height=df2['usage'])
plt.show()

Labels in Plots

I am having some issues adding labels to the legend. For some reason matplotlib is ignoring the labels I create in the dataframe. Any help?
pandas version: 0.13.0
matplotlib version: 1.3.1
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
# Sample dataframe
d = {'date': [pd.to_datetime('1/1/2013'), pd.to_datetime('1/1/2014'), pd.to_datetime('1/1/2015')],
'number': [1,2,3],
'letter': ['A','B','C']}
df = pd.DataFrame(d)
####################
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(13, 10))
fig.subplots_adjust(hspace=2.0) ## Create space between plots
# Chart 1
df.plot(ax=axes[0], label='one')
# Chart 2
df.set_index('date')['number'].plot(ax=axes[1], label='two')
# add a little sugar
axes[0].set_title('This is the title')
axes[0].set_ylabel('the y axis')
axes[0].set_xlabel('the x axis')
axes[0].legend(loc='best')
axes[1].legend(loc='best');
The problem is that Chart 1 is returning the legend as "number" and I want it to say "one".
Will illustrate this for first axis. You may repeat for the second.
In [72]: fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(13, 10))
Get a reference to the axis
In [73]: ax=df.plot(ax=axes[0])
Get the legend
In [74]: legend = ax.get_legend()
Get the text of the legend
In [75]: text = legend.get_texts()[0]
Printing the current text of the legend
In [77]: text.get_text()
Out[77]: u'number'
Setting the desired text
In [78]: text.set_text("one")
Drawing to update
In [79]: plt.draw()
The following figure shows the changed legend for first axis. You may do the same for the other axis.
NB: IPython autocomplete helped a lot to figure out this answer!