Python rolling Sharpe ratio with Pandas or NumPy

Python rolling Sharpe ratio with Pandas or NumPy - pandas

I am trying to generate a plot of the 6-month rolling Sharpe ratio using Python with Pandas/NumPy.
My input data is below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Generate sample data
d = pd.date_range(start='1/1/2008', end='12/1/2015')
df = pd.DataFrame(d, columns=['Date'])
df['returns'] = np.random.rand(d.size, 1)
df = df.set_index('Date')
print(df.head(20))
returns
Date
2008-01-01 0.232794
2008-01-02 0.957157
2008-01-03 0.079939
2008-01-04 0.772999
2008-01-05 0.708377
2008-01-06 0.579662
2008-01-07 0.998632
2008-01-08 0.432605
2008-01-09 0.499041
2008-01-10 0.693420
2008-01-11 0.330222
2008-01-12 0.109280
2008-01-13 0.776309
2008-01-14 0.079325
2008-01-15 0.559206
2008-01-16 0.748133
2008-01-17 0.747319
2008-01-18 0.936322
2008-01-19 0.211246
2008-01-20 0.755340
What I want
The type of plot I am trying to produce is this or the first plot from here (see below).
My attempt
Here is the equation I am using:
def my_rolling_sharpe(y):
return np.sqrt(126) * (y.mean() / y.std()) # 21 days per month X 6 months = 126
# Calculate rolling Sharpe ratio
df['rs'] = calc_sharpe_ratio(df['returns'])
fig, ax = plt.subplots(figsize=(10, 3))
df['rs'].plot(style='-', lw=3, color='indianred', label='Sharpe')\
.axhline(y = 0, color = "black", lw = 3)
plt.ylabel('Sharpe ratio')
plt.legend(loc='best')
plt.title('Rolling Sharpe ratio (6-month)')
fig.tight_layout()
plt.show()
The problem is that I am getting a horizontal line since my function is giving a single value for the Sharpe ratio. This value is the same for all the Dates. In the example plots, they appear to be showing many ratios.
Question
Is it possible to plot a 6-month rolling Sharpe ratio that changes from one day to the next?

Approximately correct solution using df.rolling and a fixed window size of 180 days:
df['rs'] = df['returns'].rolling('180d').apply(my_rolling_sharpe)
This window isn't exactly 6 calendar months wide because rolling requires a fixed window size, so trying window='6MS' (6 Month Starts) throws a ValueError.
To calculate the Sharpe ratio for a window exactly 6 calendar months wide, I'll copy this super cool answer by SO user Mike:
df['rs2'] = [my_rolling_sharpe(df.loc[d - pd.offsets.DateOffset(months=6):d, 'returns'])
for d in df.index]
# Compare the two windows
df.plot(y=['rs', 'rs2'], linewidth=0.5)

I have prepared an alternative solution to your question, this one is based on using solely the window functions from pandas.
Here I have defined "on the fly" the calculation of the Sharpe Ratio, please consider for your solution the following parameters:
I have used a Risk Free rate of 2%
The dash line is just a Benchmark for the rolling Sharpe Ratio, the value is 1.6
So the code is the following
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Generate sample data
d = pd.date_range(start='1/1/2008', end='12/1/2015')
df = pd.DataFrame(d, columns=['Date'])
df['returns'] = np.random.rand(d.size, 1)
df = df.set_index('Date')
df['rolling_SR'] = df.returns.rolling(180).apply(lambda x: (x.mean() - 0.02) / x.std(), raw = True)
df.fillna(0, inplace = True)
df[df['rolling_SR'] > 0].rolling_SR.plot(style='-', lw=3, color='orange',
label='Sharpe', figsize = (10,7))\
.axhline(y = 1.6, color = "blue", lw = 3,
linestyle = '--')
plt.ylabel('Sharpe ratio')
plt.legend(loc='best')
plt.title('Rolling Sharpe ratio (6-month)')
plt.show()
print('---------------------------------------------------------------')
print('In case you want to check the result data\n')
print(df.tail()) # I use tail, beacause of the size of your window.
You should get something similar to this picture

Related

Calculating time-series as a percentage of total

I'm looking at county-level procurement data (millions of bills) and plotting time-series with matplotlib and pandas using groupby:
dataframe_slice.groupby(pd.Grouper(freq='1M')).bill_amount.sum().plot
where bill_amount is a column of floats that shows how much was billed. How can I change the graph to show the dataframe_slice as a percentage of total dataframe bill_amount?

I am not aware of an out-of-the-box pandas function for that (but I am hoping to be proven wrong). Imho, you have to calculate the percentage per group by calculating the total_sum, then determining the percentage per group aggregation. Stand-alone code:
import pandas as pd
from matplotlib import pyplot as plt
#fake data generation
import numpy as np
np.random.seed(123)
n = 200
start = pd.to_datetime("2017-07-17")
end = pd.to_datetime("2018-04-03")
ndays = (end - start).days + 1
date_range = pd.to_timedelta(np.random.rand(n) * ndays, unit="D") + start
df = pd.DataFrame({"ind": date_range,
"bill_amount": np.random.randint(10, 30, n),
"cat": np.random.choice(["X", "Y", "Z"], n)})
df.set_index("ind", inplace=True)
#df.sort_index(inplace=True)
#this assumes that your dataframe has a datetime index
#here starts the actual calculation
total_sum = df.bill_amount.sum()
dataframe_slice = df.groupby(pd.Grouper(freq='1M')).bill_amount.sum().div(total_sum)*100
dataframe_slice.plot()
#and we beautify the plot
plt.xlabel("Month of expenditure")
plt.ylabel("Percentage of expenditure")
plt.tight_layout()
plt.show()
Sample output:

How to change a seaborn histogram plot to work for hours of the day?

I have a pandas dataframe with lots of time intervals of varying start times and lengths. I am interested in the distribution of start times over 24hours. I therefore have another column entitled Hour with just that in. I have plotted a histogram using seaborn to look at the distribution but obviously the x axis starts at 0 and runs to 24. I wonder if there is a way to change so it runs from 8 to 8 and loops over at 23 to 0 so it provides a better visualisation of my data from a time perspective. Thanks in advance.
sns.distplot(df2['Hour'], bins = 24, kde = False).set(xlim=(0,23))

If you want to have a custom order of x-values on your bar plot, I'd suggest using matplotlib directly and plot your histogram simply as a bar plot with width=1 to get rid of padding between bars.
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
# prepare sample data
dates = pd.date_range(
start=datetime(2020, 1, 1),
end=datetime(2020, 1, 7),
freq="H")
random_dates = np.random.choice(dates, 1000)
df = pd.DataFrame(data={"date":random_dates})
df["hour"] = df["date"].dt.hour
# set your preferred order of hours
hour_order = [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7]
# calculate frequencies of each hour and sort them
plot_df = (
df["hour"]
.value_counts()
.rename_axis("hour", axis=0)
.reset_index(name="freq")
.set_index("hour")
.loc[hour_order]
.reset_index())
# day / night colour split
day_mask = ((8 <= plot_df["hour"]) & (plot_df["hour"] <= 20))
plot_df["color"] = np.where(day_mask, "skyblue", "midnightblue")
# actual plotting - note that you have to cast hours as strings
fig = plt.figure(figsize=(8,4))
ax = fig.add_subplot(111)
ax.bar(
x=plot_df["hour"].astype(str),
height=plot_df["freq"],
color=plot_df["color"], width=1)
ax.set_xlabel('Hour')
ax.set_ylabel('Frequency')
plt.show()

How to plot a time serie having only business day without jump between the missing days [duplicate]

ax.plot_date((dates, dates), (highs, lows), '-')
I'm currently using this command to plot financial highs and lows using Matplotlib. It works great, but how do I remove the blank spaces in the x-axis left by days without market data, such as weekends and holidays?
I have lists of dates, highs, lows, closes and opens. I can't find any examples of creating a graph with an x-axis that show dates but doesn't enforce a constant scale.

One of the advertised features of scikits.timeseries is "Create time series plots with intelligently spaced axis labels".
You can see some example plots here. In the first example (shown below) the 'business' frequency is used for the data, which automatically excludes holidays and weekends and the like. It also masks missing data points, which you see as gaps in this plot, rather than linearly interpolating them.

Up to date answer (2018) with Matplotlib 2.1.2, Python 2.7.12
The function equidate_ax handles everything you need for a simple date x-axis with equidistant spacing of data points. Realised with ticker.FuncFormatter based on this example.
from __future__ import division
from matplotlib import pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import datetime
def equidate_ax(fig, ax, dates, fmt="%Y-%m-%d", label="Date"):
"""
Sets all relevant parameters for an equidistant date-x-axis.
Tick Locators are not affected (set automatically)
Args:
fig: pyplot.figure instance
ax: pyplot.axis instance (target axis)
dates: iterable of datetime.date or datetime.datetime instances
fmt: Display format of dates
label: x-axis label
Returns:
None
"""
N = len(dates)
def format_date(index, pos):
index = np.clip(int(index + 0.5), 0, N - 1)
return dates[index].strftime(fmt)
ax.xaxis.set_major_formatter(FuncFormatter(format_date))
ax.set_xlabel(label)
fig.autofmt_xdate()
#
# Some test data (with python dates)
#
dates = [datetime.datetime(year, month, day) for year, month, day in [
(2018,2,1), (2018,2,2), (2018,2,5), (2018,2,6), (2018,2,7), (2018,2,28)
]]
y = np.arange(6)
# Create plots. Left plot is default with a gap
fig, [ax1, ax2] = plt.subplots(1, 2)
ax1.plot(dates, y, 'o-')
ax1.set_title("Default")
ax1.set_xlabel("Date")
# Right plot will show equidistant series
# x-axis must be the indices of your dates-list
x = np.arange(len(dates))
ax2.plot(x, y, 'o-')
ax2.set_title("Equidistant Placement")
equidate_ax(fig, ax2, dates)

I think you need to "artificially synthesize" the exact form of plot you want by using xticks to set the tick labels to the strings representing the dates (of course placing the ticks at equispaced intervals even though the dates you're representing aren't equispaced) and then using a plain plot.

I will typically use NumPy's NaN (not a number) for values that are invalid or not present. They are represented by Matplotlib as gaps in the plot and NumPy is part of pylab/Matplotlib.
>>> import pylab
>>> xs = pylab.arange(10.) + 733632. # valid date range
>>> ys = [1,2,3,2,pylab.nan,2,3,2,5,2.4] # some data (one undefined)
>>> pylab.plot_date(xs, ys, ydate=False, linestyle='-', marker='')
[<matplotlib.lines.Line2D instance at 0x0378D418>]
>>> pylab.show()

I ran into this problem again and was able to create a decent function to handle this issue, especially concerning intraday datetimes. Credit to #Primer for this answer.
def plot_ts(ts, step=5, figsize=(10,7), title=''):
"""
plot timeseries ignoring date gaps
Params
------
ts : pd.DataFrame or pd.Series
step : int, display interval for ticks
figsize : tuple, figure size
title: str
"""
fig, ax = plt.subplots(figsize=figsize)
ax.plot(range(ts.dropna().shape[0]), ts.dropna())
ax.set_title(title)
ax.set_xticks(np.arange(len(ts.dropna())))
ax.set_xticklabels(ts.dropna().index.tolist());
# tick visibility, can be slow for 200,000+ ticks
xticklabels = ax.get_xticklabels() # generate list once to speed up function
for i, label in enumerate(xticklabels):
if not i%step==0:
label.set_visible(False)
fig.autofmt_xdate()

You can simply change dates to strings:
import matplotlib.pyplot as plt
import datetime
f = plt.figure(1, figsize=(10,5))
ax = f.add_subplot(111)
today = datetime.datetime.today().date()
yesterday = today - datetime.timedelta(days=1)
three_days_later = today + datetime.timedelta(days=3)
x_values = [yesterday, today, three_days_later]
y_values = [75, 80, 90]
x_values = [f'{x:%Y-%m-%d}' for x in x_values]
ax.bar(x_values, y_values, color='green')
plt.show()

scikits.timeseries functionality has largely been moved to pandas, so you can now resample a dataframe to only include the values on weekdays.
>>>import pandas as pd
>>>import matplotlib.pyplot as plt
>>>s = pd.Series(list(range(10)), pd.date_range('2015-09-01','2015-09-10'))
>>>s
2015-09-01 0
2015-09-02 1
2015-09-03 2
2015-09-04 3
2015-09-05 4
2015-09-06 5
2015-09-07 6
2015-09-08 7
2015-09-09 8
2015-09-10 9
>>> s.resample('B', label='right', closed='right').last()
2015-09-01 0
2015-09-02 1
2015-09-03 2
2015-09-04 3
2015-09-07 6
2015-09-08 7
2015-09-09 8
2015-09-10 9
and then to plot the dataframe as normal
s.resample('B', label='right', closed='right').last().plot()
plt.show()

Just use mplfinance
https://github.com/matplotlib/mplfinance
import mplfinance as mpf
# df = 'ohlc dataframe'
mpf.plot(df)

Plotting period series in matplotlib pyplot

I'm trying to plot timeseries revenue data by quarter with matplotlib.pyplot but keep getting an error. Below is my code and the errors The desired behavior is to plot the revenue data by quarter using matplotlib. When I try to do this, I get:
TypeError: Axis must havefreqset to convert to Periods
Is it because timeseries dates expressed as periods cannot be plotted in matplotlib? Below is my code.
def parser(x):
return pd.to_datetime(x, format='%m%Y')
tot = pd.read_table('C:/Desktop/data.txt', parse_dates=[2], index_col=[2], date_parser=parser)
tot = tot.dropna()
tot = tot.to_period('Q').reset_index().groupby(['origin', 'date'], as_index=False).agg(sum)
tot.head()
origin date rev
0 KY 2016Q2 1783.16
1 TN 2014Q1 32128.36
2 TN 2014Q2 16801.40
3 TN 2014Q3 33863.39
4 KY 2014Q4 103973.66
plt.plot(tot.date, tot.rev)

If you want to use matplotlib, the following code should give you the desired plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'origin': ['KY','TN','TN','TN','KY'],
'date': ['2016Q2','2014Q1','2014Q2','2014Q3','2014Q4'],
'rev': [1783.16, 32128.36, 16801.40, 33863.39, 103973.66]})
x = np.arange(0,len(df),1)
fig, ax = plt.subplots(1,1)
ax.plot(x,df['rev'])
ax.set_xticks(x)
ax.set_xticklabels(df['date'])
plt.show()
You could use the xticks command and represent the data with a bar chart with the following code:
plt.bar(range(len(df.rev)), df.rev, align='center')
plt.xticks(range(len(df.rev)), df.date, size='small')

It seems like bug.
For me works DataFrame.plot:
ooc.plot(x='date', y='rev')

Pandas bar plot changes date format

I have a simple stacked line plot that has exactly the date format I want magically set when using the following code.
df_ts = df.resample("W", how='max')
df_ts.plot(figsize=(12,8), stacked=True)
However, the dates mysteriously transform themselves to an ugly and unreadable format when plotting the same data as a bar plot.
df_ts = df.resample("W", how='max')
df_ts.plot(kind='bar', figsize=(12,8), stacked=True)
The original data was transformed a bit to have the weekly max. Why is this radical change in automatically set dates happening? How can I have the nicely formatted dates as above?
Here is some dummy data
start = pd.to_datetime("1-1-2012")
idx = pd.date_range(start, periods= 365).tolist()
df=pd.DataFrame({'A':np.random.random(365), 'B':np.random.random(365)})
df.index = idx
df_ts = df.resample('W', how= 'max')
df_ts.plot(kind='bar', stacked=True)

The plotting code assumes that each bar in a bar plot deserves its own label.
You could override this assumption by specifying your own formatter:
ax.xaxis.set_major_formatter(formatter)
The pandas.tseries.converter.TimeSeries_DateFormatter that Pandas uses to format the dates in the "good" plot works well with line plots when the x-values are dates. However, with a bar plot the x-values (at least those received by TimeSeries_DateFormatter.__call__) are merely integers starting at zero. If you try to use TimeSeries_DateFormatter with a bar plot, all the labels thus start at the Epoch, 1970-1-1 UTC, since this is the date which corresponds to zero. So the formatter used for line plots is unfortunately useless for bar plots (at least as far as I can see).
The easiest way I see to produce the desired formatting is to generate and set the labels explicitly:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker
start = pd.to_datetime("5-1-2012")
idx = pd.date_range(start, periods=365)
df = pd.DataFrame({'A': np.random.random(365), 'B': np.random.random(365)})
df.index = idx
df_ts = df.resample('W').max()
ax = df_ts.plot(kind='bar', stacked=True)
# Make most of the ticklabels empty so the labels don't get too crowded
ticklabels = ['']*len(df_ts.index)
# Every 4th ticklable shows the month and day
ticklabels[::4] = [item.strftime('%b %d') for item in df_ts.index[::4]]
# Every 12th ticklabel includes the year
ticklabels[::12] = [item.strftime('%b %d\n%Y') for item in df_ts.index[::12]]
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
plt.gcf().autofmt_xdate()
plt.show()
yields
For those looking for a simple example of a bar plot with dates:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
dates = pd.date_range('2012-1-1', '2017-1-1', freq='M')
df = pd.DataFrame({'A':np.random.random(len(dates)), 'Date':dates})
fig, ax = plt.subplots()
df.plot.bar(x='Date', y='A', ax=ax)
ticklabels = ['']*len(df)
skip = len(df)//12
ticklabels[::skip] = df['Date'].iloc[::skip].dt.strftime('%Y-%m-%d')
ax.xaxis.set_major_formatter(mticker.FixedFormatter(ticklabels))
fig.autofmt_xdate()
# fixes the tracker
# https://matplotlib.org/users/recipes.html
def fmt(x, pos=0, max_i=len(ticklabels)-1):
i = int(x)
i = 0 if i < 0 else max_i if i > max_i else i
return dates[i]
ax.fmt_xdata = fmt
plt.show()

I've struggled with this problem too, and after reading several posts came up with the following solution, which seems to me slightly clearer than matplotlib.dates approach.
Labels without modification:
# Use DatetimeIndex instead of date_range for pandas earlier than 1.0.0 version
timeline = pd.date_range(start='2018, November', freq='M', periods=15)
df = pd.DataFrame({'date': timeline, 'value': np.random.randn(15)})
df.set_index('date', inplace=True)
df.plot(kind='bar', figsize=(12, 8), color='#2ecc71')
Labels with modification:
def line_format(label):
"""
Convert time label to the format of pandas line plot
"""
month = label.month_name()[:3]
if month == 'Jan':
month += f'\n{label.year}'
return month
# Note that we specify rot here
ax = df.plot(kind='bar', figsize=(12, 8), color='#2ecc71', rot=0)
ax.set_xticklabels(map(line_format, df.index))
This approach will add year to the label only if it is January

Here's an easy approach with pandas plot() and without using matplotlib dates:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# generate sample data
start = pd.to_datetime("1-1-2012")
index = pd.date_range(start, periods= 365)
df = pd.DataFrame({'A' : np.random.random(365), 'B' : np.random.random(365)}, index=index)
# resample to any timeframe you need, e.g. months
df_months = df.resample("M").sum()
# plot
fig, ax = plt.subplots()
df_months.plot(kind="bar", figsize=(16,5), stacked=True, ax=ax)
# format xtick-labels with list comprehension
ax.set_xticklabels([x.strftime("%Y-%m") for x in df_months.index], rotation=45)
plt.show()

How to get nicely formatted dates like the pandas line plot
The issue is that the pandas bar plot processes the date variable as a categorical variable where each date is considered to be a unique category, so the x-axis units are set to integers starting at 0 (like the default DataFrame index when none is assigned) and the full string of each date is shown without any automatic formatting.
Here are two solutions to format the date tick labels of a pandas (stacked) bar chart of a time series:
The first is a variation of the answer by unutbu and is made to better fit the data shown in the question;
The second is a generalized solution that lets you use matplotlib date tick locators and formatters which produces appropriate date labels for time series of any type of frequency.
But first, let's see what the nicely formatted tick labels look like when the sample data is plotted with a pandas line plot.
Default pandas line plot date formatting
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.dates as mdates # v 3.3.2
# Create sample dataset with a daily frequency and resample it to a weekly frequency
rng = np.random.default_rng(seed=123) # random number generator
idx = pd.date_range(start='2012-01-01', end='2013-12-31', freq='D')
df_raw = pd.DataFrame(rng.random(size=(idx.size, 3)),
index=idx, columns=list('ABC'))
df = df_raw.resample('W').sum() # default is 'W-SUN'
# Create pandas stacked line plot
ax = df.plot(stacked=True, figsize=(10,5))
Because the data is grouped by week with timestamps for Sundays (frequency W-SUN), the monthly tick labels are not necessarily placed on the first day of the month and there can be 3 or 4 weeks between each first week of the month so the minor ticks are unevenly spaced (noticeable if you look closely). Here are the exact dates of the major ticks:
# Convert major x ticks to date labels
np.array([mdates.num2date(tick*7-4).strftime('%Y-%b-%d') for tick in ax.get_xticks()])
"""
array(['2012-Jan-01', '2012-Apr-01', '2012-Jul-01', '2012-Oct-07',
'2013-Jan-06', '2013-Apr-07', '2013-Jul-07', '2013-Oct-06',
'2014-Jan-05'], dtype='<U11')
"""
The challenge lies in selecting the ticks for each first week of the month seeing as they are unequally spaced. Other answers have provided simple solutions based on a fixed tick frequency which produces oddly spaced labels in terms of dates where the months can be sometimes repeated (for example the month of July in unutbu's answer). Or they have provided solutions based on a monthly time series instead of a weekly time series, which is simpler to format seeing as there are always 12 months per year. So here is a solution that gives nicely formatted tick labels like in the pandas line plot and that works for any frequency of data.
Solution 1: pandas bar plot with tick labels based on the DatetimeIndex
# Create pandas stacked bar chart
ax = df.plot.bar(stacked=True, figsize=(10,5))
# Create list of monthly timestamps by selecting the first weekly timestamp of each
# month (in this example, the first Sunday of each month)
monthly_timestamps = [timestamp for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
# Automatically select appropriate number of timestamps so that x-axis does
# not get overcrowded with tick labels
step = 1
while len(monthly_timestamps[::step]) > 10: # increase number if time range >3 years
step += 1
timestamps = monthly_timestamps[::step]
# Create tick labels from timestamps
labels = [ts.strftime('%b\n%Y') if ts.year != timestamps[idx-1].year
else ts.strftime('%b') for idx, ts in enumerate(timestamps)]
# Set major ticks and labels
ax.set_xticks([df.index.get_loc(ts) for ts in timestamps])
ax.set_xticklabels(labels)
# Set minor ticks without labels
ax.set_xticks([df.index.get_loc(ts) for ts in monthly_timestamps], minor=True)
# Rotate and center labels
ax.figure.autofmt_xdate(rotation=0, ha='center')
To my knowledge, there is no way of getting this exact label formatting with the matplotlib.dates (mdates) tick locators and formatters. Nevertheless, combining mdates functionalities with a pandas stacked bar plot can come in handy if you prefer using tick locators/formatters or if you want to have dynamic ticks when using the interactive interface of matplotlib (to pan/zoom in and out).
At this point, it may be useful to consider creating the stacked bar plot in matplotlib directly, where you need to loop through the variables to create the stacked bar. The pandas-based solution shown below works by looping through the patches of the bars to relocate them according to matplotlib date units. So it is basically one loop instead of another, up to you to see which is more convenient.
Solution 2: pandas bar plot with matplotlib tick locators and formatters
This generalized solution uses the mdates AutoDateLocator which places ticks at the beginning of months/years. If you generate data and timestamps with pd.date_range in pandas (like in this example), you should keep in mind that the commonly used 'M' and 'Y' frequencies produce timestamps for the end date of the periods. The code given in the following example aligns monthly/yearly tick marks with 'MS' and 'YS' frequencies.
If you import a dataset using end-of-period dates (or some other type of pandas frequency not aligned with AutoDateLocator ticks), I am not aware of any convenient way to shift the AutoDateLocator accordingly so that the labels become correctly aligned with the bars. I see two options: i) resample the data using df.resample('MS').sum() if that does not cause any issue regarding the meaning of the underlying data; ii) or else use another date locator.
This issue causes no problem in the following example seeing as the data has a week end frequency 'W-SUN' so the monthly/yearly labels placed at a month/year start frequency are fine.
# Create pandas stacked bar chart with the default bar width = 0.5
ax = df.plot.bar(stacked=True, figsize=(10,5))
# Compute width of bars in matplotlib date units, 'md' (in days) and adjust it if
# the bar width in df.plot.bar has been set to something else than the default 0.5
bar_width_md_default, = np.diff(mdates.date2num(df.index[:2]))/2
bar_width = ax.patches[0].get_width()
bar_width_md = bar_width*bar_width_md_default/0.5
# Compute new x values in matplotlib date units for the patches (rectangles) that
# make up the stacked bars, adjusting the positions according to the bar width:
# if the frequency is in months (or years), the bars may not always be perfectly
# centered over the tick marks depending on the number of days difference between
# the months (or years) given by df.index[0] and [1] used to compute the bar
# width, this should not be noticeable if the bars are wide enough.
x_bars_md = mdates.date2num(df.index) - bar_width_md/2
nvar = len(ax.get_legend_handles_labels()[1])
x_patches_md = np.ravel(nvar*[x_bars_md])
# Set bars to new x positions and adjust width: this loop works fine with NaN
# values as well because in bar plot NaNs are drawn with a rectangle of 0 height
# located at the foot of the bar, you can verify this with patch.get_bbox()
for patch, x_md in zip(ax.patches, x_patches_md):
patch.set_x(x_md)
patch.set_width(bar_width_md)
# Set major ticks
maj_loc = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(maj_loc)
# Show minor tick under each bar (instead of each month) to highlight
# discrepancy between major tick locator and bar positions seeing as no tick
# locator is available for first-week-of-the-month frequency
ax.set_xticks(x_bars_md + bar_width_md/2, minor=True)
# Set major tick formatter
zfmts = ['', '%b\n%Y', '%b', '%b-%d', '%H:%M', '%H:%M']
fmt = mdates.ConciseDateFormatter(maj_loc, zero_formats=zfmts, show_offset=False)
ax.xaxis.set_major_formatter(fmt)
# Shift the plot frame to where the bars are now located
xmin = min(x_bars_md) - bar_width_md
xmax = max(x_bars_md) + 2*bar_width_md
ax.set_xlim(xmin, xmax)
# Adjust tick label format last, else it may sometimes not be applied correctly
ax.figure.autofmt_xdate(rotation=0, ha='center')
Minor ticks a displayed under each bar to highlight the fact that the timestamps of the bars often do not coincide with a month/year start marked by the labels of the AutoDateLocator ticks. I am not aware of any date locator that can be used to select ticks for the first week of each month and reproduce exactly the result shown in solution 1.
Documentation: date format codes, mdates.ConciseDateFormatter

Here's a possibly easier approach using mdates, though requires you to loop over your columns, calling bar plot from matplotlib. Here's an example where I plot just one column and use mdates for customized ticks and labels (EDIT Added looping function to plot all columns stacked):
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def format_x_date_month_day(ax):
# Standard date x-axis formatting block, labels each month and ticks each day
days = mdates.DayLocator()
months = mdates.MonthLocator() # every month
dayFmt = mdates.DateFormatter('%D')
monthFmt = mdates.DateFormatter('%Y-%m')
ax.figure.autofmt_xdate()
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthFmt)
ax.xaxis.set_minor_locator(days)
def df_stacked_bar_formattable(df, ax, **kwargs):
P = []
lastBar = None
for col in df.columns:
X = df.index
Y = df[col]
if lastBar is not None:
P.append(ax.bar(X, Y, bottom=lastBar, **kwargs))
else:
P.append(ax.bar(X, Y, **kwargs))
lastBar = Y
plt.legend([p[0] for p in P], df.columns)
span_days = 90
start = pd.to_datetime("1-1-2012")
idx = pd.date_range(start, periods=span_days).tolist()
df=pd.DataFrame(index=idx, data={'A':np.random.random(span_days), 'B':np.random.random(span_days)})
plt.close('all')
fig, ax = plt.subplots(1)
df_stacked_bar_formattable(df, ax)
format_x_date_month_day(ax)
plt.show()
(Referencing matplotlib.org for example of looping to create a stacked bar plot.) This gives us
Another approach that should work and be much easier is to use df.plot.bar(ax=ax, stacked=True), however it does not admit date axis formatting with mdates and is the subject of my question.

Maybe not the most elegant, but hopefully easy way:
fig = plt.figure()
ax = fig.add_subplot(111)
df_ts.plot(kind='bar', figsize=(12,8), stacked=True,ax=ax)
ax.set_xticklabels(''*len(df_ts.index))
df_ts.plot(linewidth=0, ax=ax) # This sets the nice x_ticks automatically
[EDIT]: ax=ax neede in df_ts.plot()

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Python rolling Sharpe ratio with Pandas or NumPy - pandas

Related

Calculating time-series as a percentage of total

How to change a seaborn histogram plot to work for hours of the day?

How to plot a time serie having only business day without jump between the missing days [duplicate]

Plotting period series in matplotlib pyplot

Pandas bar plot changes date format

Categories

Resources