pandas.groupby --> DatetimeIndex --> groupby year - pandas

I come from Javascript and struggle. Need to sort data by DatetimeIndex, further by the year.
CSV looks like this (i shortened it because of more than 1300 entries):
date,value
2016-05-09,1201
2017-05-10,2329
2018-05-11,1716
2019-05-12,10539
I wrote my code like this to throw away the first and last 2.5 percent of the dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.read_csv( "fcc-forum-pageviews.csv", index_col="date", parse_dates=True).sort_values('value')
df = df.iloc[(int(round((df.count() / 100 * 2,5)[0]))):(int(round(((df.count() / 100 * 97,5)[0])-1)))]
df = df.sort_index()
Now I need to group my DatetimeIndex by years to plot it in a manner way by matplotlib. I struggle right here:
def draw_bar_plot():
df_bar = df
fig, ax = plt.subplots()
fig.figure.savefig('bar_plot.png')
return fig
I really dont know how to groupby years.
Doing something like:
print(df_bar.groupby(df_bar.index).first())
leads to:
value
date
2016-05-19 19736
2016-05-20 17491
2016-05-26 18060
2016-05-27 19997
2016-05-28 19044
... ...
2019-11-23 146658
2019-11-24 138875
2019-11-30 141161
2019-12-01 142918
2019-12-03 158549
How to group this by year? Maybe further explain how to get the data ploted by mathplotlib as a bar chart accurately.

This will group the data by year
df_year_wise_sum = df.groupby([df.index.year]).sum()
This line of code will give a bar plot
df_year_wise_sum.plot(kind='bar')
plt.savefig('bar_plot.png')
plt.show()

Related

Plotting Closing price of SBIN NSE but it is plotting 3:30pm-9:15am also

the dataframe only have time from 9:15 am to 3:30pm every working day. but when it is getting plotted as chart, matplotlib is plotting times between 3:30 to 9:15 next day now tell the solution
can't figure out how to get continuous figure & here is the csv
i tried using
import matplotlib.pyplot as plt
import pandas as pd
#data = the read file in the link
data = pd.read_csv('sbin.csv')
plt.plot(data['MA_50'], label='MA 50', color='red')
plt.plot(data['MA_10'], label='MA 10', color='blue')
plt.legend(loc='best')
plt.xlim(data.index[0], data.index[-1])
plt.xlabel('Time')
plt.ylabel('Price')plt.show()
I expect again 9:15 after 3:30
Have you tried using mplfinance ?
Using the data you posted:
import mplfinance as mpf
import pandas as pd
df = pd.read_csv('sbin.csv', index_col=0, parse_dates=True)
mpf.plot(df, type='candle', ema=(10,50), style='yahoo')
The result:

Is it possible to create a matplotlib barplot of timed values using simple notation?

Is the DataFrame['2002':'2005'][['Value1','Value2']].bar(any args?) possible way to create a bar plot of Dataframe, where 2 values are distributed within a long period of time.
I can create a simple plot, but I want bars (1 bar - 1 day).
If there is no such a simple way, what would be the simpliest one?
Well, not too self coding, but anyway, possible, so it works... (thanks to these guys here...)
Suppose you have a dataframe like
df.head()
temperature pressure
2018-10-01 21.860016 1031.418143
2018-10-02 20.590761 1063.008550
2018-10-03 21.356381 1047.183300
2018-10-04 20.393329 1037.710172
2018-10-05 20.716377 1027.680324
... ... ...
Then you'd need to import
import matplotlib.pyplot as plt
import matplotlib.dates as md
to plot your data like
fig, ax = plt.subplots()
ax.xaxis_date()
ax.bar(df.index -md.num2timedelta(.2), df.temperature, .4, color='r', label = df.temperature.name)
ax2 = ax.twinx()
ax2.bar(df.index +md.num2timedelta(.2), df.pressure, .4, color='b', label = df.pressure.name)
ax.xaxis.set_major_formatter(md.DateFormatter('%b %d'))
ax.xaxis.set_major_locator(md.WeekdayLocator())
fig.legend()
Result:

Python rolling Sharpe ratio with Pandas or NumPy

I am trying to generate a plot of the 6-month rolling Sharpe ratio using Python with Pandas/NumPy.
My input data is below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Generate sample data
d = pd.date_range(start='1/1/2008', end='12/1/2015')
df = pd.DataFrame(d, columns=['Date'])
df['returns'] = np.random.rand(d.size, 1)
df = df.set_index('Date')
print(df.head(20))
returns
Date
2008-01-01 0.232794
2008-01-02 0.957157
2008-01-03 0.079939
2008-01-04 0.772999
2008-01-05 0.708377
2008-01-06 0.579662
2008-01-07 0.998632
2008-01-08 0.432605
2008-01-09 0.499041
2008-01-10 0.693420
2008-01-11 0.330222
2008-01-12 0.109280
2008-01-13 0.776309
2008-01-14 0.079325
2008-01-15 0.559206
2008-01-16 0.748133
2008-01-17 0.747319
2008-01-18 0.936322
2008-01-19 0.211246
2008-01-20 0.755340
What I want
The type of plot I am trying to produce is this or the first plot from here (see below).
My attempt
Here is the equation I am using:
def my_rolling_sharpe(y):
return np.sqrt(126) * (y.mean() / y.std()) # 21 days per month X 6 months = 126
# Calculate rolling Sharpe ratio
df['rs'] = calc_sharpe_ratio(df['returns'])
fig, ax = plt.subplots(figsize=(10, 3))
df['rs'].plot(style='-', lw=3, color='indianred', label='Sharpe')\
.axhline(y = 0, color = "black", lw = 3)
plt.ylabel('Sharpe ratio')
plt.legend(loc='best')
plt.title('Rolling Sharpe ratio (6-month)')
fig.tight_layout()
plt.show()
The problem is that I am getting a horizontal line since my function is giving a single value for the Sharpe ratio. This value is the same for all the Dates. In the example plots, they appear to be showing many ratios.
Question
Is it possible to plot a 6-month rolling Sharpe ratio that changes from one day to the next?
Approximately correct solution using df.rolling and a fixed window size of 180 days:
df['rs'] = df['returns'].rolling('180d').apply(my_rolling_sharpe)
This window isn't exactly 6 calendar months wide because rolling requires a fixed window size, so trying window='6MS' (6 Month Starts) throws a ValueError.
To calculate the Sharpe ratio for a window exactly 6 calendar months wide, I'll copy this super cool answer by SO user Mike:
df['rs2'] = [my_rolling_sharpe(df.loc[d - pd.offsets.DateOffset(months=6):d, 'returns'])
for d in df.index]
# Compare the two windows
df.plot(y=['rs', 'rs2'], linewidth=0.5)
I have prepared an alternative solution to your question, this one is based on using solely the window functions from pandas.
Here I have defined "on the fly" the calculation of the Sharpe Ratio, please consider for your solution the following parameters:
I have used a Risk Free rate of 2%
The dash line is just a Benchmark for the rolling Sharpe Ratio, the value is 1.6
So the code is the following
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# Generate sample data
d = pd.date_range(start='1/1/2008', end='12/1/2015')
df = pd.DataFrame(d, columns=['Date'])
df['returns'] = np.random.rand(d.size, 1)
df = df.set_index('Date')
df['rolling_SR'] = df.returns.rolling(180).apply(lambda x: (x.mean() - 0.02) / x.std(), raw = True)
df.fillna(0, inplace = True)
df[df['rolling_SR'] > 0].rolling_SR.plot(style='-', lw=3, color='orange',
label='Sharpe', figsize = (10,7))\
.axhline(y = 1.6, color = "blue", lw = 3,
linestyle = '--')
plt.ylabel('Sharpe ratio')
plt.legend(loc='best')
plt.title('Rolling Sharpe ratio (6-month)')
plt.show()
print('---------------------------------------------------------------')
print('In case you want to check the result data\n')
print(df.tail()) # I use tail, beacause of the size of your window.
You should get something similar to this picture

How can I convert pandas date time xticks to readable format?

I am plotting a time series with a date time index. The plot needs to be a particular size for the journal format. Consequently, the sticks are not readable since they span many years.
Here is a data sample
2013-02-10 0.7714492098202259
2013-02-11 0.7709101833765016
2013-02-12 0.7704911332770049
2013-02-13 0.7694975914173087
2013-02-14 0.7692108921323576
The data is a series with a datetime index and spans from 2013 to 2016. I use
data.plot(ax = ax)
to plot the data.
How can I format my xticks to read like '13 instead of 2013?
It seems there is some incompatibility between pandas and matplotlib formatters/locators when it comes to dates. See e.g. those questions:
Pandas plot - modify major and minor xticks for dates
Pandas Dataframe line plot display date on xaxis
I'm not entirely sure why it still works in some cases to use matplotlib formatters and not in others. However because of those issues, the bullet-proof solution is to use matplotlib to plot the graph instead of the pandas plotting function.
This allows to use locators and formatters just as seen in the matplotlib example.
Here the solution to the question would look as follows:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
dates = pd.date_range("2013-01-01", "2017-06-20" )
y = np.cumsum(np.random.normal(size=len(dates)))
s = pd.Series(y, index=dates)
fig, ax = plt.subplots()
ax.plot(s.index, s.values)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
yearFmt = mdates.DateFormatter("'%y")
ax.xaxis.set_major_formatter(yearFmt)
plt.show()
According to this example, you can do the following
import matplotlib.dates as mdates
yearsFmt = mdates.DateFormatter("'%y")
years = mdates.YearLocator()
ax = df.plot()
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
Full work below
Add word value so pd.read_clipboard puts dates into index
value
2013-02-10 0.7714492098202259
2014-02-11 0.7709101833765016
2015-02-12 0.7704911332770049
2016-02-13 0.7694975914173087
2017-02-14 0.7692108921323576
Then read in data and convert index
df = pd.read_clipboard(sep='\s+')
df.index = pd.to_datetime(df.index)

Plotting period series in matplotlib pyplot

I'm trying to plot timeseries revenue data by quarter with matplotlib.pyplot but keep getting an error. Below is my code and the errors The desired behavior is to plot the revenue data by quarter using matplotlib. When I try to do this, I get:
TypeError: Axis must havefreqset to convert to Periods
Is it because timeseries dates expressed as periods cannot be plotted in matplotlib? Below is my code.
def parser(x):
return pd.to_datetime(x, format='%m%Y')
tot = pd.read_table('C:/Desktop/data.txt', parse_dates=[2], index_col=[2], date_parser=parser)
tot = tot.dropna()
tot = tot.to_period('Q').reset_index().groupby(['origin', 'date'], as_index=False).agg(sum)
tot.head()
origin date rev
0 KY 2016Q2 1783.16
1 TN 2014Q1 32128.36
2 TN 2014Q2 16801.40
3 TN 2014Q3 33863.39
4 KY 2014Q4 103973.66
plt.plot(tot.date, tot.rev)
If you want to use matplotlib, the following code should give you the desired plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'origin': ['KY','TN','TN','TN','KY'],
'date': ['2016Q2','2014Q1','2014Q2','2014Q3','2014Q4'],
'rev': [1783.16, 32128.36, 16801.40, 33863.39, 103973.66]})
x = np.arange(0,len(df),1)
fig, ax = plt.subplots(1,1)
ax.plot(x,df['rev'])
ax.set_xticks(x)
ax.set_xticklabels(df['date'])
plt.show()
You could use the xticks command and represent the data with a bar chart with the following code:
plt.bar(range(len(df.rev)), df.rev, align='center')
plt.xticks(range(len(df.rev)), df.date, size='small')
It seems like bug.
For me works DataFrame.plot:
ooc.plot(x='date', y='rev')