Pandas plot of a stacked and grouped bar chart - pandas

I have a CSV-file with a couple of data:
# Comment header
#
MainCategory,SubCategory,DurationM,DurationH,Number
MainCat1,Sub1.1,598,9.97,105
MainCat1,Sub1.2,11,0.18,4
MainCat1,Sub1.3,17,0.28,5
MainCat1,Sub1.4,16,0.27,2
MainCat2,Sub2.1,14161,236.02,102
MainCat2,Sub2.2,834,13.90,17
MainCat3,Sub3.1,4325,72.08,472
MainCat3,Sub3.2,7,0.12,2
MainCat4,Sub4.1,614,10.23,60
MainCat5,Sub5.1,6362,106.03,142
MainCat5,Sub5.2,141,2.35,6
Misc,Misc.1,3033,50.55,53
MainCat4,Sub4.2,339,5.65,4
MainCat4,Sub4.3,925,15.42,11
Misc,Misc.2,2641,44.02,28
MainCat6,Sub6.1,370,6.17,4
MainCat7,Sub7.1,9601,160.02,10
MainCat4,Sub4.4,75,1.25,2
MainCat8,Sub8.1,148,2.47,4
MainCat8,Sub8.2,680,11.35,7
MainCat9,Sub9.1,3997,66.62,1
MainCat8,Sub8.3,105,1.75,2
MainCat4,Sub4.5,997,16.62,1
MainCat10,Sub10.1,12,0.20,3
MainCat4,Sub4.6,10,0.17,1
MainCat10,Sub10.2,13,0.22,1
MainCat4,Sub4.7,561,9.35,4
MainCat10,Sub10.3,1043,17.38,47
What I would like to achieve is a stacked bar plot where
the X-axis values/labels are given by the values/groups given by MainCategory
on the left Y-axis, the DurationH is used
on the right Y-axis the Number is used
DurationH and Number are plotted as bars per MainCategory side-by-side
In each of the bars, the SubCategory is used for stacking
Something like this:
The following code produces stacked plots, but a sequence of them:
import pandas as pd
from pandas import DataFrame
from matplotlib import pyplot as plt
import seaborn as sns
data = pd.read_csv('failureEventStatistic_total_Top10.csv', sep=',', header=2)
data = data.rename(columns={'DurationM':'Duration [min]', 'DurationH':'Duration [h]'})
data.groupby('MainCategorie')[['Duration [h]', 'Number']].plot.bar()
I tried to use unstack(), but this produces an error

You can get the plot data from a crosstab and then make a right aligned and a left aligned bar plot on the same axes:
ax = pd.crosstab(df.MainCategory, df.SubCategory.str.partition('.')[2], df.DurationH, aggfunc=sum).plot.bar(
stacked=True, width=-0.4, align='edge', ylabel='DurationH', ec='w', color=[(0,1,0,x) for x in np.linspace(1, 0.1, 7)], legend=False)
h_durationh, _ = ax.get_legend_handles_labels()
ax = pd.crosstab(df.MainCategory, df.SubCategory.str.partition('.')[2], df.Number, aggfunc=sum).plot.bar(
stacked=True, width=0.4, align='edge', secondary_y=True, ec='w', color=[(0,0,1,x) for x in np.linspace(1, 0.1, 7)], legend=False, ax=ax)
h_number, _ = ax.get_legend_handles_labels()
ax.set_ylabel('Number')
ax.set_xlim(left=ax.get_xlim()[0] - 0.5)
ax.legend([h_durationh[0], h_number[0]], ['DurationH', 'Number'])

Related

Plot multiple mplfinance plots sharing x axis

I am trying to plot 5 charts one under the other with mplfinance.
This works:
for coin in coins:
mpf.plot(df_coins[coin], title=coin, type='line', volume=True, show_nontrading=True)
However each plot is a separate image in my Python Notebook cell output. And the x-axis labelling is repeated for each image.
I try to make a single figure containing multiple subplot/axis, and plot one chart into each axis:
from matplotlib import pyplot as plt
N = len(df_coins)
fig, axes = plt.subplots(N, figsize=(20, 5*N), sharex=True)
for i, ((coin, df), ax) in zip(enumerate(df_coins.items()), axes):
mpf.plot(df, ax=ax, title=coin, type='line', volume=True, show_nontrading=True)
This displays subfigures of the correct dimensions, however they are not getting populated with data. Axes are labelled from 0.0 to 1.0 and the title is not appearing.
What am I missing?
There are two ways to subplot. One is to set up a figure with mplfinance objects. The other way is to use your adopted matplotlib subplot to place it.
yfinace data
import matplotlib.pyplot as plt
import mplfinance as mpf
import yfinance as yf
tickers = ['AAPL','GOOG','TSLA']
data = yf.download(tickers, start="2021-01-01", end="2021-03-01", group_by='ticker')
aapl = data[('AAPL',)]
goog = data[('GOOG',)]
tsla = data[('TSLA',)]
mplfinance
fig = mpf.figure(style='yahoo', figsize=(12,9))
#fig.subplots_adjust(hspace=0.3)
ax1 = fig.add_subplot(3,1,1, sharex=ax3)
ax2 = fig.add_subplot(3,1,2, sharex=ax3)
ax3 = fig.add_subplot(3,1,3)
mpf.plot(aapl, type='line', ax=ax1, axtitle='AAPL', xrotation=0)
mpf.plot(goog, type='line', ax=ax2, axtitle='GOOG', xrotation=0)
mpf.plot(tsla, type='line', ax=ax3, axtitle='TSLA', xrotation=0)
ax1.set_xticklabels([])
ax2.set_xticklabels([])
matplotlib
N = len(tickers)
fig, axes = plt.subplots(N, figsize=(20, 5*N), sharex=True)
for df,t,ax in zip([aapl,goog,tsla], tickers, axes):
mpf.plot(df, ax=ax, axtitle=t, type='line', show_nontrading=True)# volume=True
In addition to the techniques mentioned by #r-beginners there is another technique that may work for you in the case where all plots share the same x-axis. That is to use mpf.make_addplot().
aps = []
for coin in coins[1:]:
aps.append(mpf.make_addplot(df_coins[coin]['Close'], title=coin, type='line'))
coin = coins[0]
mpf.plot(df_coins[coin],axtitle=coin,type='line',volume=True,show_nontrading=True,addplot=aps)
If you choose to do type='candle' instead of 'line', then change
df_coins[coin]['Close']
to simply
df_coins[coin]

Align multi-line ticks in Seaborn plot

I have the following heatmap:
I've broken up the category names by each capital letter and then capitalised them. This achieves a centering effect across the labels on my x-axis by default which I'd like to replicate across my y-axis.
yticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.index]
xticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.columns]
fig, ax = plt.subplots(figsize=(20,15))
sns.heatmap(corr, ax=ax, annot=True, fmt="d",
cmap="Blues", annot_kws=annot_kws,
mask=mask, vmin=0, vmax=5000,
cbar_kws={"shrink": .8}, square=True,
linewidths=5)
for p in ax.texts:
myTrans = p.get_transform()
offset = mpl.transforms.ScaledTranslation(-12, 5, mpl.transforms.IdentityTransform())
p.set_transform(myTrans + offset)
plt.yticks(plt.yticks()[0], labels=yticks, rotation=0, linespacing=0.4)
plt.xticks(plt.xticks()[0], labels=xticks, rotation=0, linespacing=0.4)
where corr represents a pre-defined pandas dataframe.
I couldn't seem to find an align parameter for setting the ticks and was wondering if and how this centering could be achieved in seaborn/matplotlib?
I've adapted the seaborn correlation plot example below.
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white")
# Generate a large random dataset
rs = np.random.RandomState(33)
d = pd.DataFrame(data=rs.normal(size=(100, 7)),
columns=['Donald\nDuck','Mickey\nMouse','Han\nSolo',
'Luke\nSkywalker','Yoda','Santa\nClause','Ronald\nMcDonald'])
# Compute the correlation matrix
corr = d.corr()
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
for i in ax.get_yticklabels():
i.set_ha('right')
i.set_rotation(0)
for i in ax.get_xticklabels():
i.set_ha('center')
Note the two for sequences above. These get the label and then set the horizontal alignment (You can also change the vertical alignment (set_va()).
The code above produces this:

How to change the order of these plots using zorder?

I'm trying to get a line plot to be over the bar plot. But no matter what I do to change the zorder, it seems like it keeps the bar on top of the line. Nothing I do to try to change zorder seems to work. Sometimes the bar plot just doesn't show up if zorder is <= 0.
import pandas as pd
import matplotlib.pyplot as plt
def tail_plot(tail):
plt.figure()
#line plot
ax1 = incidence[incidence['actual_inc'] != 0].tail(tail).plot(x='date', y=['R_t', 'upper 95% CI', 'lower 95% CI'], color = ['b', '#808080', '#808080'])
ax1.set_zorder(2)
ax2 = ax1.twinx()
inc = incidence[incidence['actual_inc'] != 0]['actual_inc'].tail(tail).values
dates = incidence[incidence['actual_inc'] != 0]['date'].tail(tail).values
#bar plot
ax2.bar(dates, inc, color ='red', zorder=1)
ax2.set_zorder(1)
Keeps giving me this:
The problem with the approach in the post is that ax1 has a white background which totally occludes the plot of ax2. To solve this, the background color can be set to 'none'.
Note that the plt.figure() in the example code of the post creates an empty plot because the pandas plot creates its own new figure (as no ax is given explicitly).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({f'curve {i}': 20 + np.random.normal(.1, .5, 30).cumsum() for i in range(1, 6)})
# line plot
ax1 = df.plot()
ax1.set_zorder(2)
ax1.set_facecolor('none')
ax2 = ax1.twinx()
# bar plot
x = np.arange(30)
ax2.bar(x, np.random.randint(7 + x, 2 * x + 10), color='red', zorder=1)
ax2.set_zorder(1)
plt.show()

Visualize 1-dimensional data in a sequential colormap

I have a pandas series containing numbers ranging between 0 and 100. I want to visualise it in a horizontal bar consisting of 3 main colours.
I have tried using seaborn but all I can get is a heatmap matrix. I have also tried the below code, which is producing what I need but not in the way I need it.
x = my_column.values
y = x
t = x
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.scatter(x, y, c=t, cmap='brg')
ax2.scatter(x, y, c=t, cmap='brg')
plt.show()
What I'm looking for is something similar to the below figure, how can I achieve that using matplotlib or seaborn?
The purpose of this is not quite clear, however, the following would produce an image like the one shown in the question:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
x = np.linspace(100,0,101)
fig, ax = plt.subplots(figsize=(6,1), constrained_layout=True)
cmap = LinearSegmentedColormap.from_list("", ["limegreen", "gold", "crimson"])
ax.imshow([x], cmap=cmap, aspect="auto",
extent=[x[0]-np.diff(x)[0]/2, x[-1]+np.diff(x)[0]/2,0,1])
ax.tick_params(axis="y", left=False, labelleft=False)
plt.show()

"panel barchart" in matplotlib

I would like to produce a figure like this one using matplotlib:
(source: peltiertech.com)
My data are in a pandas DataFrame, and I've gotten as far as a regular stacked barchart, but I can't figure out how to do the part where each category is given its own y-axis baseline.
Ideally I would like the vertical scale to be exactly the same for all the subplots and move the panel labels off to the side so there can be no gaps between the rows.
I haven't exactly replicated what you want but this should get you pretty close.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
#create dummy data
cols = ['col'+str(i) for i in range(10)]
ind = ['ind'+str(i) for i in range(10)]
df = pd.DataFrame(np.random.normal(loc=10, scale=5, size=(10, 10)), index=ind, columns=cols)
#create plot
sns.set_style("whitegrid")
axs = df.plot(kind='bar', subplots=True, sharey=True,
figsize=(6, 5), legend=False, yticks=[],
grid=False, ylim=(0, 14), edgecolor='none',
fontsize=14, color=[sns.xkcd_rgb["brownish red"]])
plt.text(-1, 100, "The y-axis label", fontsize=14, rotation=90) # add a y-label with custom positioning
sns.despine(left=True) # get rid of the axes
for ax in axs: # set the names beside the axes
ax.lines[0].set_visible(False) # remove ugly dashed line
ax.set_title('')
sername = ax.get_legend_handles_labels()[1][0]
ax.text(9.8, 5, sername, fontsize=14)
plt.suptitle("My panel chart", fontsize=18)