How can I graph a candlestick chart with a DataFrame in Python? - matplotlib

I am not able to figure out how to graph a candlestick OHLC chart with python. Ever since matplotlib.finance was deprecated I've had this issue... Thanks for your help!
The DataFrame "quotes" is an excel (can't paste here), but has the following columns:
Index(['Date', 'Open', 'High', 'Low', 'Close'], dtype='object')
I also have a default index. The 'Date' column is a pandas._libs.tslibs.timestamps.Timestamp
When I run the code I get the following error:
File "", line 30, in
candlestick_ohlc(ax, zip(mdates.date2num(quotes.index.to_pydatetime()),
AttributeError: 'RangeIndex' object has no attribute 'to_pydatetime'
Here is my code:
import datetime
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.dates import MONDAY, DateFormatter, DayLocator,
WeekdayLocator
from mpl_finance import candlestick_ohlc
date1 = "2004-2-1"
date2 = "2004-4-12"
mondays = WeekdayLocator(MONDAY)
alldays = DayLocator()
weekFormatter = DateFormatter('%b %d')
dayFormatter = DateFormatter('%d')
fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
candlestick_ohlc(ax, zip(mdates.date2num(quotes.index.to_pydatetime()),
quotes['Open'], quotes['High'],
quotes['Low'], quotes['Close']),
width=0.6)
ax.xaxis_date()
ax.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=45,
horizontalalignment='right')
plt.show()

If you don't specify an index while building your DataFrame, it will default to a RangeIndex that just numbers your rows consecutively. This RangeIndex is obviously not convertible to a date -- hence the error. The read_excel function takes index_col as a parameter to specify which column to use as an index. You might also have to provide parse_dates=True.

Related

matplotlib.axis.axes error in mplfinance for volume

I am working with stock data which looks like daily.head
My code is:
import pandas as pd
import mplfinance as mpf
import matplotlib.pyplot as plt
data = pd.read_csv('/content/drive/MyDrive/python/TEchAnalysis.csv')
figdims=(15,10)
fig , ax = plt.subplots(figsize=figdims)
mpf.plot(daily , type='candle' , mav=(5,10,20,50,100) ,volume=True , ax=ax )
I am having the error
ValueError: `volume` must be of type `matplotlib.axis.Axes`
Please can somebody explain me this error & how to fix it?
If you specify external axes, you should also specify axes to display the volume. According to the documentation about external axes:
Please note the following:
Use kwarg ax= to pass any matplotlib Axes that you want into mpf.plot()
If you also want to plot volume, then you must pass in an Axes instance for the volume, so instead of volume=True, use volume=<myVolumeAxesInstance><myVolumeAxesInstance>.
If you specify ax= for mpf.plot() then you must also specify ax= for all calls to make_addplot().
Try this:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import mplfinance as mpf
import pandas as pd
import yfinance as yf
%matplotlib inline
df = yf.download('aapl', '2015-01-01', '2021-01-01')
df.rename(columns= {'Adj Close': 'Adj_close'}, inplace= True)
df1 = df.copy().loc['2015-01':'2015-02', :]
fig, ax1 = plt.subplots(figsize= (12, 6))
fig.set_facecolor('#ffe8a8')
ax1.set_zorder(1)
ax1.grid(True, color= 'k', linestyle= '--')
ax1.set_frame_on(False)
ax2 = ax1.twinx()
ax2.grid(False)
mpf.plot(df1, ax= ax1, type= 'candle', volume= ax2, xlim= (df1.index[0],
df1.index[-1]))
plt.show()
It works fairly well, giving some options to customize.
This is the output:

Add a category without data in it to a plot in seaborn

I am making plotting some data as a catplot like this:
ax = sns.catplot(x='Kind', y='VAF', hue='Sample', jitter=True, data=df, legend=False)
The trouble is that some of the categories of 'VAF' contain no data, and the corresponding label is not added to the plot. Is there a way to retain the label but just not plot any points for it?
Here is a reproducible example to help explain:
x=pd.DataFrame({'Data':[1,3,4,6,3,2],'Number':['One','One','One','One','Three','Three']})
plt.figure()
ax = sns.catplot(x='Number', y='Data', jitter=True, data=x)
In this plot you can see that on the x-axis, samples One and Three are displayed. But imagine that there is also a sample Two that just had no data points in it. How can I display One, Two, and Three on the x-axis?
Order parameter
Of course one would need to know which categories are expected. Given a list of expected categories, one can use the order parameter to supply the expected categories.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'Data':[1,3,4,6,3,2],
'Number':['One','One','One','One','Three','Three']})
exp_cats = ["One", "Two", "Three"]
ax = sns.stripplot(x='Number', y='Data', jitter=True, data=df, order=exp_cats)
plt.show()
Alternatives
The above works with matplotlib 2.2.3, but not with 3.0. It works again with the current development version (hence 3.1). For the moment, there are the following alternatives:
A. Looping over categories
Given a list of expected categories, one can just loop over them and plot a scatter of each category.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Data':[1,3,4,6,3,2],
'Number':['One','One','One','One','Three','Three']})
exp_cats = ["One", "Two", "Three"]
for i, cat in enumerate(exp_cats):
cdf = df[df["Number"] == cat]
x = np.zeros(len(cdf))+i+.2*(np.random.rand(len(cdf))-0.5)
plt.scatter(x, cdf["Data"].values)
plt.xticks(range(len(exp_cats)), exp_cats)
plt.show()
B. Map categories to numbers.
You can map the expected categories to numbers and plot numbers instead of categories.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Data':[1,3,4,6,3,2],
'Number':['One','One','One','One','Three','Three']})
exp_cats = ["One", "Two", "Three"]
df["IntNumber"] = df["Number"].map(dict(zip(exp_cats, range(len(exp_cats)))))
plt.scatter(df["IntNumber"] + .2*(np.random.rand(len(df))-0.5), df["Data"].values,
c = df["IntNumber"].values.astype(int))
plt.xticks(range(len(exp_cats)), exp_cats)
plt.show()
C. Appending missing categories to the dataframe
Finally you may append nan values to the dataframe to make sure each expected category appears in it.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'Data':[1,3,4,6,3,2],
'Number':['One','One','One','One','Three','Three']})
exp_cats = ["One", "Two", "Three"]
dfa = df.append(pd.DataFrame({'Data':[np.nan]*len(exp_cats), 'Number':exp_cats}))
ax = sns.stripplot(x='Number', y='Data', jitter=True, data=dfa, order=exp_cats)
plt.show()

Only plotting observed dates in matplotlib, skipping range of dates

I have a simple dataframe I am plotting in matplotlib. However, the plot is showing the range of the dates, rather than just the two observed data points.
How can I only plot the two data points and not the range of the dates?
df structure:
Date Number
2018-01-01 12:00:00 1
2018-02-01 12:00:00 2
Output of the matplotlib code:
Here is what I expected (this was done using a string and not a date on the x-axis data):
df code:
import pandas as pd
df = pd.DataFrame([['2018-01-01 12:00:00', 1], ['2018-02-01 12:00:00',2]], columns=['Date', 'Number'])
df['Date'] = pd.to_datetime(df['Date'])
df.set_index(['Date'],inplace=True)
Plot code:
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots(
figsize=(4,5),
dpi=72
)
width = 0.75
#starts the bar chart creation
ax1.bar(df.index, df['Number'],
width,
align='center',
color=('#666666', '#333333'),
edgecolor='#FF0000',
linewidth=2
)
ax1.set_ylim(0,3)
ax1.set_ylabel('Score')
fig.autofmt_xdate()
#Title
plt.title('Scores by group and gender')
plt.tight_layout()
plt.show()
Try adding something like:
import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%y-%m-%d')
ax1.xaxis.set_major_formatter(myFmt)
plt.xticks(df.index)
I think the dates are transformed to large integers at the time of the plot. So width = 0.75 is very small, try something bigger (like width = 20:
Matplotlib bar plots are numeric in nature. If you want a categorical bar plot instead, you may use pandas bar plots.
df.plot.bar()
You may then want to beautify the labels a bit
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame([['2018-01-01 12:00:00', 1], ['2018-02-01 12:00:00',2]], columns=['Date', 'Number'])
df['Date'] = pd.to_datetime(df['Date'])
df.set_index(['Date'],inplace=True)
ax = df.plot.bar()
ax.tick_params(axis="x", rotation=0)
ax.set_xticklabels([t.get_text().split()[0] for t in ax.get_xticklabels()])
plt.show()

convert pandas datetime64[ns] to matplotlib date-float for date x-axis in seaborn tsplot

Ok I'm trying to do something that should be trivial but instead I've spent more time than I'd like to admit searching google and stack overflow only to become more frustrated.
What I'm trying to do: I'd like to format my x-axis on a seaborn tsplot.
What my stack overflow searching has told me: matplot lib has a set_major_formattter function but I can't seem to use it without tripping an overflow error.
What I'm looking for: a simple way to convert datetime64[ns] to a float that can be used with marplot lib's set_major_formatter.
Where I think I'm stuck:
df.date_action = df.date_action.values.astype('float')
# converts the field to a float but matplotlib expects seconds since 0001-01-01 not nano seconds since epoch
is there a simple way to do this that I'm missing?
the most helpful post I reviewed so far was
31255815 which got me 95% of the way there but not quite
here is some sample code to illustrate the issue
# standard imports
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
import seaborn as sns; sns.set()
## generate fake data
from datetime import timedelta, date
import random
def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
yield start_date + timedelta(n)
start_date = date(2013, 1, 1)
end_date = date(2018, 6, 2)
date_list = []
number_list = []
for single_date in daterange(start_date, end_date):
date_list.append(single_date)
if len(number_list) > 0:
number_list.append(random.random() + number_list[-1])
else:
number_list.append(random.random())
df = pd.DataFrame(data={'date_action': date_list, 'values': number_list})
# note my actual data comes in as a datetime64[ns]
df['date_action'] = df['date_action'].astype('datetime64[ns]')
# the following looked promising but is still offset an incorrect amount
#df.date_action = df.date_action.values.astype('float')
#df.date_action = df.date_action.to_datetime
## chart stuff
plt.clf()
import matplotlib.dates as mdates
df['dummy_01'] = 0
rows = 1
cols = 1
fig, axs = plt.subplots(nrows=rows, ncols=cols, figsize=(10, 8))
ax1 = plt.subplot2grid((rows, cols), (0, 0))
for i in [ax1]: # trying to format x-axis
pass
i.xaxis_date()
i.xaxis.set_major_locator(mdates.AutoDateLocator())
i.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
sns.tsplot(df, time='date_action', unit='dummy_01',
value='values', ax=ax1) #
plt.plot()
plt.show()

Formatting index of a pandas table in a plot

I am trying to annotate my plot with part of a dataframe. However, the time 00:00:00 is appearing in all the row labels. Is there a clean way to remove them since my data is daily in frequency? I have tried the normalize function but that doesn't remove the time; it just zeroes the time.
Here is what the issue looks like and the sample code to reproduce the issue.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.tools.plotting import table
# Setup of mock data
date_range = pd.date_range('2014-01-01', '2015-01-01', freq='MS')
df = pd.DataFrame({'Values': np.random.rand(0, 10, len(date_range))}, index=date_range)
# The plotting of the table
fig7 = plt.figure()
ax10 = plt.subplot2grid((1, 1), (0, 0))
table(ax10, np.round(df.tail(5), 2), loc='center', colWidths=[0.1] * 2)
fig7.show()
Simply access the .date attribute of the DateTimeIndex so that every individual element of your index would be represented in datetime.date format.
The default DateTimeIndex format is datetime.datetime which gets defined automatically even if you didn't explicitly define your index that way before.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.tools.plotting import table
np.random.seed(42)
# Setup of mock data
date_range = pd.date_range('2014-01-01', '2015-01-01', freq='MS')
df = pd.DataFrame({'Values': np.random.rand(len(date_range))}, date_range)
df.index = df.index.date # <------ only change here
# The plotting of the table
fig7 = plt.figure()
ax10 = plt.subplot2grid((1, 1), (0, 0))
table(ax10, np.round(df.tail(5), 2), loc='center', colWidths=[0.1] * 2)
fig7.show()