How can I convert pandas date time xticks to readable format? - pandas

I am plotting a time series with a date time index. The plot needs to be a particular size for the journal format. Consequently, the sticks are not readable since they span many years.
Here is a data sample
2013-02-10 0.7714492098202259
2013-02-11 0.7709101833765016
2013-02-12 0.7704911332770049
2013-02-13 0.7694975914173087
2013-02-14 0.7692108921323576
The data is a series with a datetime index and spans from 2013 to 2016. I use
data.plot(ax = ax)
to plot the data.
How can I format my xticks to read like '13 instead of 2013?

It seems there is some incompatibility between pandas and matplotlib formatters/locators when it comes to dates. See e.g. those questions:
Pandas plot - modify major and minor xticks for dates
Pandas Dataframe line plot display date on xaxis
I'm not entirely sure why it still works in some cases to use matplotlib formatters and not in others. However because of those issues, the bullet-proof solution is to use matplotlib to plot the graph instead of the pandas plotting function.
This allows to use locators and formatters just as seen in the matplotlib example.
Here the solution to the question would look as follows:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
dates = pd.date_range("2013-01-01", "2017-06-20" )
y = np.cumsum(np.random.normal(size=len(dates)))
s = pd.Series(y, index=dates)
fig, ax = plt.subplots()
ax.plot(s.index, s.values)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator())
yearFmt = mdates.DateFormatter("'%y")
ax.xaxis.set_major_formatter(yearFmt)
plt.show()

According to this example, you can do the following
import matplotlib.dates as mdates
yearsFmt = mdates.DateFormatter("'%y")
years = mdates.YearLocator()
ax = df.plot()
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
Full work below
Add word value so pd.read_clipboard puts dates into index
value
2013-02-10 0.7714492098202259
2014-02-11 0.7709101833765016
2015-02-12 0.7704911332770049
2016-02-13 0.7694975914173087
2017-02-14 0.7692108921323576
Then read in data and convert index
df = pd.read_clipboard(sep='\s+')
df.index = pd.to_datetime(df.index)

Related

Plotting Closing price of SBIN NSE but it is plotting 3:30pm-9:15am also

the dataframe only have time from 9:15 am to 3:30pm every working day. but when it is getting plotted as chart, matplotlib is plotting times between 3:30 to 9:15 next day now tell the solution
can't figure out how to get continuous figure & here is the csv
i tried using
import matplotlib.pyplot as plt
import pandas as pd
#data = the read file in the link
data = pd.read_csv('sbin.csv')
plt.plot(data['MA_50'], label='MA 50', color='red')
plt.plot(data['MA_10'], label='MA 10', color='blue')
plt.legend(loc='best')
plt.xlim(data.index[0], data.index[-1])
plt.xlabel('Time')
plt.ylabel('Price')plt.show()
I expect again 9:15 after 3:30
Have you tried using mplfinance ?
Using the data you posted:
import mplfinance as mpf
import pandas as pd
df = pd.read_csv('sbin.csv', index_col=0, parse_dates=True)
mpf.plot(df, type='candle', ema=(10,50), style='yahoo')
The result:

pandas.groupby --> DatetimeIndex --> groupby year

I come from Javascript and struggle. Need to sort data by DatetimeIndex, further by the year.
CSV looks like this (i shortened it because of more than 1300 entries):
date,value
2016-05-09,1201
2017-05-10,2329
2018-05-11,1716
2019-05-12,10539
I wrote my code like this to throw away the first and last 2.5 percent of the dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
df = pd.read_csv( "fcc-forum-pageviews.csv", index_col="date", parse_dates=True).sort_values('value')
df = df.iloc[(int(round((df.count() / 100 * 2,5)[0]))):(int(round(((df.count() / 100 * 97,5)[0])-1)))]
df = df.sort_index()
Now I need to group my DatetimeIndex by years to plot it in a manner way by matplotlib. I struggle right here:
def draw_bar_plot():
df_bar = df
fig, ax = plt.subplots()
fig.figure.savefig('bar_plot.png')
return fig
I really dont know how to groupby years.
Doing something like:
print(df_bar.groupby(df_bar.index).first())
leads to:
value
date
2016-05-19 19736
2016-05-20 17491
2016-05-26 18060
2016-05-27 19997
2016-05-28 19044
... ...
2019-11-23 146658
2019-11-24 138875
2019-11-30 141161
2019-12-01 142918
2019-12-03 158549
How to group this by year? Maybe further explain how to get the data ploted by mathplotlib as a bar chart accurately.
This will group the data by year
df_year_wise_sum = df.groupby([df.index.year]).sum()
This line of code will give a bar plot
df_year_wise_sum.plot(kind='bar')
plt.savefig('bar_plot.png')
plt.show()

How to plot a time serie having only business day without jump between the missing days [duplicate]

ax.plot_date((dates, dates), (highs, lows), '-')
I'm currently using this command to plot financial highs and lows using Matplotlib. It works great, but how do I remove the blank spaces in the x-axis left by days without market data, such as weekends and holidays?
I have lists of dates, highs, lows, closes and opens. I can't find any examples of creating a graph with an x-axis that show dates but doesn't enforce a constant scale.
One of the advertised features of scikits.timeseries is "Create time series plots with intelligently spaced axis labels".
You can see some example plots here. In the first example (shown below) the 'business' frequency is used for the data, which automatically excludes holidays and weekends and the like. It also masks missing data points, which you see as gaps in this plot, rather than linearly interpolating them.
Up to date answer (2018) with Matplotlib 2.1.2, Python 2.7.12
The function equidate_ax handles everything you need for a simple date x-axis with equidistant spacing of data points. Realised with ticker.FuncFormatter based on this example.
from __future__ import division
from matplotlib import pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import datetime
def equidate_ax(fig, ax, dates, fmt="%Y-%m-%d", label="Date"):
"""
Sets all relevant parameters for an equidistant date-x-axis.
Tick Locators are not affected (set automatically)
Args:
fig: pyplot.figure instance
ax: pyplot.axis instance (target axis)
dates: iterable of datetime.date or datetime.datetime instances
fmt: Display format of dates
label: x-axis label
Returns:
None
"""
N = len(dates)
def format_date(index, pos):
index = np.clip(int(index + 0.5), 0, N - 1)
return dates[index].strftime(fmt)
ax.xaxis.set_major_formatter(FuncFormatter(format_date))
ax.set_xlabel(label)
fig.autofmt_xdate()
#
# Some test data (with python dates)
#
dates = [datetime.datetime(year, month, day) for year, month, day in [
(2018,2,1), (2018,2,2), (2018,2,5), (2018,2,6), (2018,2,7), (2018,2,28)
]]
y = np.arange(6)
# Create plots. Left plot is default with a gap
fig, [ax1, ax2] = plt.subplots(1, 2)
ax1.plot(dates, y, 'o-')
ax1.set_title("Default")
ax1.set_xlabel("Date")
# Right plot will show equidistant series
# x-axis must be the indices of your dates-list
x = np.arange(len(dates))
ax2.plot(x, y, 'o-')
ax2.set_title("Equidistant Placement")
equidate_ax(fig, ax2, dates)
I think you need to "artificially synthesize" the exact form of plot you want by using xticks to set the tick labels to the strings representing the dates (of course placing the ticks at equispaced intervals even though the dates you're representing aren't equispaced) and then using a plain plot.
I will typically use NumPy's NaN (not a number) for values that are invalid or not present. They are represented by Matplotlib as gaps in the plot and NumPy is part of pylab/Matplotlib.
>>> import pylab
>>> xs = pylab.arange(10.) + 733632. # valid date range
>>> ys = [1,2,3,2,pylab.nan,2,3,2,5,2.4] # some data (one undefined)
>>> pylab.plot_date(xs, ys, ydate=False, linestyle='-', marker='')
[<matplotlib.lines.Line2D instance at 0x0378D418>]
>>> pylab.show()
I ran into this problem again and was able to create a decent function to handle this issue, especially concerning intraday datetimes. Credit to #Primer for this answer.
def plot_ts(ts, step=5, figsize=(10,7), title=''):
"""
plot timeseries ignoring date gaps
Params
------
ts : pd.DataFrame or pd.Series
step : int, display interval for ticks
figsize : tuple, figure size
title: str
"""
fig, ax = plt.subplots(figsize=figsize)
ax.plot(range(ts.dropna().shape[0]), ts.dropna())
ax.set_title(title)
ax.set_xticks(np.arange(len(ts.dropna())))
ax.set_xticklabels(ts.dropna().index.tolist());
# tick visibility, can be slow for 200,000+ ticks
xticklabels = ax.get_xticklabels() # generate list once to speed up function
for i, label in enumerate(xticklabels):
if not i%step==0:
label.set_visible(False)
fig.autofmt_xdate()
You can simply change dates to strings:
import matplotlib.pyplot as plt
import datetime
f = plt.figure(1, figsize=(10,5))
ax = f.add_subplot(111)
today = datetime.datetime.today().date()
yesterday = today - datetime.timedelta(days=1)
three_days_later = today + datetime.timedelta(days=3)
x_values = [yesterday, today, three_days_later]
y_values = [75, 80, 90]
x_values = [f'{x:%Y-%m-%d}' for x in x_values]
ax.bar(x_values, y_values, color='green')
plt.show()
scikits.timeseries functionality has largely been moved to pandas, so you can now resample a dataframe to only include the values on weekdays.
>>>import pandas as pd
>>>import matplotlib.pyplot as plt
>>>s = pd.Series(list(range(10)), pd.date_range('2015-09-01','2015-09-10'))
>>>s
2015-09-01 0
2015-09-02 1
2015-09-03 2
2015-09-04 3
2015-09-05 4
2015-09-06 5
2015-09-07 6
2015-09-08 7
2015-09-09 8
2015-09-10 9
>>> s.resample('B', label='right', closed='right').last()
2015-09-01 0
2015-09-02 1
2015-09-03 2
2015-09-04 3
2015-09-07 6
2015-09-08 7
2015-09-09 8
2015-09-10 9
and then to plot the dataframe as normal
s.resample('B', label='right', closed='right').last().plot()
plt.show()
Just use mplfinance
https://github.com/matplotlib/mplfinance
import mplfinance as mpf
# df = 'ohlc dataframe'
mpf.plot(df)

Plotting a graph with specific x-axis values

I have plotted a graph using data I have on excel and have the values of the x and y axes. However, I want to change the value on the x-axis by just presenting specific values which would reflect the key days on the axis only. Is that possible?
Here is the code I have written:
import pandas as pd
from matplotlib import pyplot as plt #download matplot library
#create a graph of the cryptocurrencies in excel
btc = pd.read_excel('/Users/User/Desktop/bitcoin_prices.xlsx')
btc.set_index('Date', inplace=True) #Chart Fit
btc.plot()
plt.xlabel('Date', fontsize= 12)
plt.ylabel('Price ($)', fontsize= 12)
plt.title('Cryptocurrency Prices', fontsize=15)
plt.figure(figsize=(60,40))
plt.show() #plot then show the file
Thank you.
I guess you want the program to recognize the datetime format of the 'Date' column. Supply parse_dates=['Dates'] to the loading call. Then you can index your data for certain days. For example:
import datetime as dt
import numpy as np
import pandas as pd
btc = pd.read_csv('my_excel_data.xlsx', parse_dates=['Dates'], index_col='Dates')
selected_time = np.arange(dt.datetime(2015, 1, 1), dt.datetime(2016, 1, 1), dt.timedelta(7))
btc_2015 = btc.loc[selected_time, :]
If you want specific labels for specific dates you have to read into axes and date formatters

How to I set up x-axis using year-month as tickmarks in matplotlib?

I have a 2D variable XVAR from a netcdf file, with dimension [year, month]. I want to plot the flattened XVAR (1D array with length nyear*nmonth) and set up the x-axis as this: years on major ticks, and months on minor ticks. The difficulty is that I donot know how to create a 1d array at monthly step. There is no monthdelta method that I could use (though I understand that the reason is because each month has different numbers of days).
In the delta=? step below, I tried delta=relativedelta.relativedelta(months=1), but got an error "object has no attribute 'total_seconds'", which I donot completely understand.
import numpy as np
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from mpl_toolkits.basemap import Basemap
from datetime import date, timedelta
ncfile = Dataset('filepath',mode='r')
XVAR4d = ncfile.variables['XVAR'][:]
XVAR2d = np.nanmean(XVAR4d,axis=(2,3)).flatten()
yrs = ncfile.variables['YEAR']
stt = date(np.min(yrs),1,1)
end = date(np.max(yrs)+1,1,1)
delta = ?
dates = mdates.drange(stt,end,delta)
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(yearsFmt)
ax1.xaxis.set_minor_locator(months)
ax1.set_xlim(stt,end)
ax1.plot(dates,xvar2d,c='r')
I decided I did like the idea of my second comment, so I am turning it into an actual proposed answer.
Instead of using drange, create the dates yourself:
totalMonths = 12*(np.max(yrs) - np.min(yrs)+1)
dates = mdates.date2num([date(np.min(yrs)+(i//12),i%12+1,1) for i in range(totalMonths)])