Plotting period series in matplotlib pyplot - pandas

I'm trying to plot timeseries revenue data by quarter with matplotlib.pyplot but keep getting an error. Below is my code and the errors The desired behavior is to plot the revenue data by quarter using matplotlib. When I try to do this, I get:
TypeError: Axis must havefreqset to convert to Periods
Is it because timeseries dates expressed as periods cannot be plotted in matplotlib? Below is my code.
def parser(x):
return pd.to_datetime(x, format='%m%Y')
tot = pd.read_table('C:/Desktop/data.txt', parse_dates=[2], index_col=[2], date_parser=parser)
tot = tot.dropna()
tot = tot.to_period('Q').reset_index().groupby(['origin', 'date'], as_index=False).agg(sum)
tot.head()
origin date rev
0 KY 2016Q2 1783.16
1 TN 2014Q1 32128.36
2 TN 2014Q2 16801.40
3 TN 2014Q3 33863.39
4 KY 2014Q4 103973.66
plt.plot(tot.date, tot.rev)

If you want to use matplotlib, the following code should give you the desired plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'origin': ['KY','TN','TN','TN','KY'],
'date': ['2016Q2','2014Q1','2014Q2','2014Q3','2014Q4'],
'rev': [1783.16, 32128.36, 16801.40, 33863.39, 103973.66]})
x = np.arange(0,len(df),1)
fig, ax = plt.subplots(1,1)
ax.plot(x,df['rev'])
ax.set_xticks(x)
ax.set_xticklabels(df['date'])
plt.show()
You could use the xticks command and represent the data with a bar chart with the following code:
plt.bar(range(len(df.rev)), df.rev, align='center')
plt.xticks(range(len(df.rev)), df.date, size='small')

It seems like bug.
For me works DataFrame.plot:
ooc.plot(x='date', y='rev')

Related

How to plot in pandas after groupby function

import pandas as pd
df = pd.read_excel(some data)
df2 = df.groupby(['Country', "Year"]).sum()
It looks like this:
Sales COGS Profit Month Number
Country Year
Canada 2013 3000
Canada 2014 3500
Other countries... other data
df3 = df2[[' Sales']]
I can plot it like this with the code:
df3.plot(kind="bar")
And it produces a chart
But I want to turn it into a line chart but my result from a simple plot is:
Stuck as to what one-liner will produce a chart that segments time on the x-axis but plots sales on y-axis with lines for different countries.
You have to stack Country column:
import matplotlib.pyplot as plt
df2 = df.groupby(['Country', 'Year'])['Sales'].sum().unstack('Country')
# Or df2.plot(title='Sales').set_xticks(df2.index)
ax = df2.plot(title='Sales')
ax.set_xticks(df2.index)
plt.show()
Output:

How to change a seaborn histogram plot to work for hours of the day?

I have a pandas dataframe with lots of time intervals of varying start times and lengths. I am interested in the distribution of start times over 24hours. I therefore have another column entitled Hour with just that in. I have plotted a histogram using seaborn to look at the distribution but obviously the x axis starts at 0 and runs to 24. I wonder if there is a way to change so it runs from 8 to 8 and loops over at 23 to 0 so it provides a better visualisation of my data from a time perspective. Thanks in advance.
sns.distplot(df2['Hour'], bins = 24, kde = False).set(xlim=(0,23))
If you want to have a custom order of x-values on your bar plot, I'd suggest using matplotlib directly and plot your histogram simply as a bar plot with width=1 to get rid of padding between bars.
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
# prepare sample data
dates = pd.date_range(
start=datetime(2020, 1, 1),
end=datetime(2020, 1, 7),
freq="H")
random_dates = np.random.choice(dates, 1000)
df = pd.DataFrame(data={"date":random_dates})
df["hour"] = df["date"].dt.hour
# set your preferred order of hours
hour_order = [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,0,1,2,3,4,5,6,7]
# calculate frequencies of each hour and sort them
plot_df = (
df["hour"]
.value_counts()
.rename_axis("hour", axis=0)
.reset_index(name="freq")
.set_index("hour")
.loc[hour_order]
.reset_index())
# day / night colour split
day_mask = ((8 <= plot_df["hour"]) & (plot_df["hour"] <= 20))
plot_df["color"] = np.where(day_mask, "skyblue", "midnightblue")
# actual plotting - note that you have to cast hours as strings
fig = plt.figure(figsize=(8,4))
ax = fig.add_subplot(111)
ax.bar(
x=plot_df["hour"].astype(str),
height=plot_df["freq"],
color=plot_df["color"], width=1)
ax.set_xlabel('Hour')
ax.set_ylabel('Frequency')
plt.show()

How do I connect two sets of XY scatter values in MatPlotLib?

I am using MatLibPlot to fetch data from an excel file and to create a scatter plot.
Here is a minimal sample table
In my scatter plot, I have two sets of XY values. In both sets, my X values are country population. I have Renewable Energy Consumed as my Y value in one set and Non-Renewable Energy Consumed in the other set.
For each Country, I would like to have a line from the renewable point to the non-renewable point.
My example code is as follows
import pandas as pd
import matplotlib.pyplot as plt
excel_file = 'example_graphs.xlsx'
datasheet = pd.read_excel(excel_file, sheet_name=0, index_col=0)
ax = datasheet.plot.scatter("Xcol","Y1col",c="b",label="set_one")
datasheet.scatter("Xcol","Y2col",c="r",label="set_two", ax=ax)
ax.show()
And it produces the following plot
I would love to be able to draw a line between the two sets of points, preferably a line I can change the thickness and color of.
As commented, you could simply loop over the dataframe and plot a line for each row.
import pandas as pd
import matplotlib.pyplot as plt
datasheet = pd.DataFrame({"Xcol" : [1,2,3],
"Y1col" : [25,50,75],
"Y2col" : [75,50,25]})
ax = datasheet.plot.scatter("Xcol","Y1col",c="b",label="set_one")
datasheet.plot.scatter("Xcol","Y2col",c="r",label="set_two", ax=ax)
for n,row in datasheet.iterrows():
ax.plot([row["Xcol"]]*2,row[["Y1col", "Y2col"]], color="limegreen", lw=3, zorder=0)
plt.show()

Python rolling Sharpe ratio with Pandas or NumPy

I am trying to generate a plot of the 6-month rolling Sharpe ratio using Python with Pandas/NumPy.
My input data is below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Generate sample data
d = pd.date_range(start='1/1/2008', end='12/1/2015')
df = pd.DataFrame(d, columns=['Date'])
df['returns'] = np.random.rand(d.size, 1)
df = df.set_index('Date')
print(df.head(20))
returns
Date
2008-01-01 0.232794
2008-01-02 0.957157
2008-01-03 0.079939
2008-01-04 0.772999
2008-01-05 0.708377
2008-01-06 0.579662
2008-01-07 0.998632
2008-01-08 0.432605
2008-01-09 0.499041
2008-01-10 0.693420
2008-01-11 0.330222
2008-01-12 0.109280
2008-01-13 0.776309
2008-01-14 0.079325
2008-01-15 0.559206
2008-01-16 0.748133
2008-01-17 0.747319
2008-01-18 0.936322
2008-01-19 0.211246
2008-01-20 0.755340
What I want
The type of plot I am trying to produce is this or the first plot from here (see below).
My attempt
Here is the equation I am using:
def my_rolling_sharpe(y):
return np.sqrt(126) * (y.mean() / y.std()) # 21 days per month X 6 months = 126
# Calculate rolling Sharpe ratio
df['rs'] = calc_sharpe_ratio(df['returns'])
fig, ax = plt.subplots(figsize=(10, 3))
df['rs'].plot(style='-', lw=3, color='indianred', label='Sharpe')\
.axhline(y = 0, color = "black", lw = 3)
plt.ylabel('Sharpe ratio')
plt.legend(loc='best')
plt.title('Rolling Sharpe ratio (6-month)')
fig.tight_layout()
plt.show()
The problem is that I am getting a horizontal line since my function is giving a single value for the Sharpe ratio. This value is the same for all the Dates. In the example plots, they appear to be showing many ratios.
Question
Is it possible to plot a 6-month rolling Sharpe ratio that changes from one day to the next?
Approximately correct solution using df.rolling and a fixed window size of 180 days:
df['rs'] = df['returns'].rolling('180d').apply(my_rolling_sharpe)
This window isn't exactly 6 calendar months wide because rolling requires a fixed window size, so trying window='6MS' (6 Month Starts) throws a ValueError.
To calculate the Sharpe ratio for a window exactly 6 calendar months wide, I'll copy this super cool answer by SO user Mike:
df['rs2'] = [my_rolling_sharpe(df.loc[d - pd.offsets.DateOffset(months=6):d, 'returns'])
for d in df.index]
# Compare the two windows
df.plot(y=['rs', 'rs2'], linewidth=0.5)
I have prepared an alternative solution to your question, this one is based on using solely the window functions from pandas.
Here I have defined "on the fly" the calculation of the Sharpe Ratio, please consider for your solution the following parameters:
I have used a Risk Free rate of 2%
The dash line is just a Benchmark for the rolling Sharpe Ratio, the value is 1.6
So the code is the following
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Generate sample data
d = pd.date_range(start='1/1/2008', end='12/1/2015')
df = pd.DataFrame(d, columns=['Date'])
df['returns'] = np.random.rand(d.size, 1)
df = df.set_index('Date')
df['rolling_SR'] = df.returns.rolling(180).apply(lambda x: (x.mean() - 0.02) / x.std(), raw = True)
df.fillna(0, inplace = True)
df[df['rolling_SR'] > 0].rolling_SR.plot(style='-', lw=3, color='orange',
label='Sharpe', figsize = (10,7))\
.axhline(y = 1.6, color = "blue", lw = 3,
linestyle = '--')
plt.ylabel('Sharpe ratio')
plt.legend(loc='best')
plt.title('Rolling Sharpe ratio (6-month)')
plt.show()
print('---------------------------------------------------------------')
print('In case you want to check the result data\n')
print(df.tail()) # I use tail, beacause of the size of your window.
You should get something similar to this picture

Matplotlib float values on the axis instead of integers

I have the following code that shows the following plot. I can't get to show the fiscal year correctly on the x axis and it's showing as if they are float. I tried to do the astype(int) and it didn't work. Any ideas on what I am doing wrong?
p1 = plt.bar(list(asset['FISCAL_YEAR']),list(asset['TOTAL']),align='center')
plt.show()
This is the plot:
In order to make sure only integer locations obtain a ticklabel, you may use a matplotlib.ticker.MultipleLocator with an integer number as argument.
To then format the numbers on the axes, you may use a matplotlib.ticker.StrMethodFormatter.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
df = pd.DataFrame({"FISCAL_YEAR" : np.arange(2000,2017),
'TOTAL' : np.random.rand(17)})
plt.bar(df['FISCAL_YEAR'],df['TOTAL'],align='center')
locator = matplotlib.ticker.MultipleLocator(2)
plt.gca().xaxis.set_major_locator(locator)
formatter = matplotlib.ticker.StrMethodFormatter("{x:.0f}")
plt.gca().xaxis.set_major_formatter(formatter)
plt.show()