Pandas df histo, format my x ticker and include empty - pandas

I got this pandas df:
index TIME
12:07 2019-06-03 12:07:28
10:04 2019-06-04 10:04:25
11:14 2019-06-09 11:14:25
...
I use this command to do an histogram to plot how much occurence for each 15min periods
df['TIME'].groupby([df["TIME"].dt.hour, df["TIME"].dt.minute]).count().plot(kind="bar")
my plot look like this:
How can I get x tick like 10:15 in lieu of (10, 15) and how manage to add x tick missing like 9:15, 9:30... to get a complet time line??

You can resample your TIME column to 15 mins intervalls and count the number of rows. Then plot a regular bar chart.
df = pd.DataFrame({'TIME': pd.to_datetime('2019-01-01') + pd.to_timedelta(pd.np.random.rand(100) * 3, unit='h')})
df = df[df.TIME.dt.minute > 15] # make gap
ax = df.resample('15T', on='TIME').count().plot.bar(rot=0)
ticklabels = [x.get_text()[-8:-3] for x in ax.get_xticklabels()]
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(ticklabels))
(for details about formatting datetime ticklabels of pandas bar plots see this SO question)

Related

How to plot a time serie having only business day without jump between the missing days [duplicate]

ax.plot_date((dates, dates), (highs, lows), '-')
I'm currently using this command to plot financial highs and lows using Matplotlib. It works great, but how do I remove the blank spaces in the x-axis left by days without market data, such as weekends and holidays?
I have lists of dates, highs, lows, closes and opens. I can't find any examples of creating a graph with an x-axis that show dates but doesn't enforce a constant scale.
One of the advertised features of scikits.timeseries is "Create time series plots with intelligently spaced axis labels".
You can see some example plots here. In the first example (shown below) the 'business' frequency is used for the data, which automatically excludes holidays and weekends and the like. It also masks missing data points, which you see as gaps in this plot, rather than linearly interpolating them.
Up to date answer (2018) with Matplotlib 2.1.2, Python 2.7.12
The function equidate_ax handles everything you need for a simple date x-axis with equidistant spacing of data points. Realised with ticker.FuncFormatter based on this example.
from __future__ import division
from matplotlib import pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import datetime
def equidate_ax(fig, ax, dates, fmt="%Y-%m-%d", label="Date"):
"""
Sets all relevant parameters for an equidistant date-x-axis.
Tick Locators are not affected (set automatically)
Args:
fig: pyplot.figure instance
ax: pyplot.axis instance (target axis)
dates: iterable of datetime.date or datetime.datetime instances
fmt: Display format of dates
label: x-axis label
Returns:
None
"""
N = len(dates)
def format_date(index, pos):
index = np.clip(int(index + 0.5), 0, N - 1)
return dates[index].strftime(fmt)
ax.xaxis.set_major_formatter(FuncFormatter(format_date))
ax.set_xlabel(label)
fig.autofmt_xdate()
#
# Some test data (with python dates)
#
dates = [datetime.datetime(year, month, day) for year, month, day in [
(2018,2,1), (2018,2,2), (2018,2,5), (2018,2,6), (2018,2,7), (2018,2,28)
]]
y = np.arange(6)
# Create plots. Left plot is default with a gap
fig, [ax1, ax2] = plt.subplots(1, 2)
ax1.plot(dates, y, 'o-')
ax1.set_title("Default")
ax1.set_xlabel("Date")
# Right plot will show equidistant series
# x-axis must be the indices of your dates-list
x = np.arange(len(dates))
ax2.plot(x, y, 'o-')
ax2.set_title("Equidistant Placement")
equidate_ax(fig, ax2, dates)
I think you need to "artificially synthesize" the exact form of plot you want by using xticks to set the tick labels to the strings representing the dates (of course placing the ticks at equispaced intervals even though the dates you're representing aren't equispaced) and then using a plain plot.
I will typically use NumPy's NaN (not a number) for values that are invalid or not present. They are represented by Matplotlib as gaps in the plot and NumPy is part of pylab/Matplotlib.
>>> import pylab
>>> xs = pylab.arange(10.) + 733632. # valid date range
>>> ys = [1,2,3,2,pylab.nan,2,3,2,5,2.4] # some data (one undefined)
>>> pylab.plot_date(xs, ys, ydate=False, linestyle='-', marker='')
[<matplotlib.lines.Line2D instance at 0x0378D418>]
>>> pylab.show()
I ran into this problem again and was able to create a decent function to handle this issue, especially concerning intraday datetimes. Credit to #Primer for this answer.
def plot_ts(ts, step=5, figsize=(10,7), title=''):
"""
plot timeseries ignoring date gaps
Params
------
ts : pd.DataFrame or pd.Series
step : int, display interval for ticks
figsize : tuple, figure size
title: str
"""
fig, ax = plt.subplots(figsize=figsize)
ax.plot(range(ts.dropna().shape[0]), ts.dropna())
ax.set_title(title)
ax.set_xticks(np.arange(len(ts.dropna())))
ax.set_xticklabels(ts.dropna().index.tolist());
# tick visibility, can be slow for 200,000+ ticks
xticklabels = ax.get_xticklabels() # generate list once to speed up function
for i, label in enumerate(xticklabels):
if not i%step==0:
label.set_visible(False)
fig.autofmt_xdate()
You can simply change dates to strings:
import matplotlib.pyplot as plt
import datetime
f = plt.figure(1, figsize=(10,5))
ax = f.add_subplot(111)
today = datetime.datetime.today().date()
yesterday = today - datetime.timedelta(days=1)
three_days_later = today + datetime.timedelta(days=3)
x_values = [yesterday, today, three_days_later]
y_values = [75, 80, 90]
x_values = [f'{x:%Y-%m-%d}' for x in x_values]
ax.bar(x_values, y_values, color='green')
plt.show()
scikits.timeseries functionality has largely been moved to pandas, so you can now resample a dataframe to only include the values on weekdays.
>>>import pandas as pd
>>>import matplotlib.pyplot as plt
>>>s = pd.Series(list(range(10)), pd.date_range('2015-09-01','2015-09-10'))
>>>s
2015-09-01 0
2015-09-02 1
2015-09-03 2
2015-09-04 3
2015-09-05 4
2015-09-06 5
2015-09-07 6
2015-09-08 7
2015-09-09 8
2015-09-10 9
>>> s.resample('B', label='right', closed='right').last()
2015-09-01 0
2015-09-02 1
2015-09-03 2
2015-09-04 3
2015-09-07 6
2015-09-08 7
2015-09-09 8
2015-09-10 9
and then to plot the dataframe as normal
s.resample('B', label='right', closed='right').last().plot()
plt.show()
Just use mplfinance
https://github.com/matplotlib/mplfinance
import mplfinance as mpf
# df = 'ohlc dataframe'
mpf.plot(df)

how to plot bar gaps in pandas dataframe with timedelta and timestamp

Given a timestamped df with timedelta showing time covered such as:
df = pd.DataFrame(pd.to_timedelta(['00:45:00','01:00:00','00:30:00']).rename('span'),
index=pd.to_datetime(['2019-09-19 18:00','2019-09-19 19:00','2019-09-19 21:00']).rename('ts'))
# span
# ts
# 2019-09-19 18:00:00 00:45:00
# 2019-09-19 19:00:00 01:00:00
# 2019-09-19 21:00:00 00:30:00
How can I plot a bar graph showing drop outs every 15 minutes? What I want is a bar graph that will show 0 or 1 on the Y axis with a 1 for each 15 minute segment in the time periods covered above, and a 0 for all the 15 minute segments not covered.
Per this answer I tried:
df['span'].astype('timedelta64[m]').plot.bar()
However this plots each timespan vertically, and does not show that the whole hour of 2019-09-19 20:00 is missing.
.
I tried
df['span'].astype('timedelta64[m]').plot()
It plots the following which is not very useful.
I also tried this answer to no avail.
Update
Based on lostCode's answer I was able to further modify the DataFrame as follows:
def isvalid(period):
for ndx, row in df.iterrows():
if (period.start_time >= ndx) and (period.start_time < row.end):
return 1
return 0
df['end']= df.index + df.span
ds = pd.period_range(df.index.min(), df.end.max(), freq='15T')
df_valid = pd.DataFrame(ds.map(isvalid).rename('valid'), index=ds.rename('period'))
Is there a better, more efficient way to do it?
You can use DataFrame.resample to create a new DataFrame to
to verify the existence of time spaces. To check use DataFrame.isin
import numpy as np
check=df.resample('H')['span'].sum().reset_index()
d=df.reset_index('ts').sort_values('ts')
check['valid']=np.where(check['ts'].isin(d['ts']),1,0)
check.set_index('ts')['valid'].plot(kind='bar',figsize=(10,10))

plot score against timestamp in pandas

I have a dataframe in pandas:
date_hour score
2019041822 -5
2019041823 0
2019041900 6
2019041901 -5
where date_hour is in YYYYMMDDHH format, and score is an int.
when I plot, there is a long line connecting 2019041823 to 2019041900, treating all the values in between as absent (ie. there is no score relating to 2019041824-2019041899, because there is no time relating to that).
Is there a way for these gaps/absetvalues to be ignored, so that it is continuous (Some of my data misses 2 days, so I have a long line which is misleading)
The red circles show the gap between nights (ie. between Apr 18 2300 and Apr 19 0000).
I used:
fig, ax = plt.subplots()
x=gpb['date_hour']
y=gpb['score']
ax.plot(x,y, '.-')
display(fig)
I believe it is because the date_hours is an int, and tried to convert to str, but was met with errors: ValueError: x and y must have same first dimension
Is there a way to plot so there are no gaps?
Try to convert date_hour to timestamp: df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H') before plot.
df = pd.DataFrame({'date_hour':[2019041822, 2019041823, 2019041900, 2019041901],
'score':[-5,0,6,-5]})
df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H')
df.plot(x='date_hour', y='score')
plt.show()
Output:
If you don't want to change your data, you can do
df = pd.DataFrame({'date_hour':[2019041822, 2019041823, 2019041900, 2019041901],
'score':[-5,0,6,-5]})
plt.plot(pd.to_datetime(df.date_hour, format='%Y%m%d%H'), df.score)
which gives:

Tick labels overlap in pandas bar chart

TL;DR: In pandas how do I plot a bar chart so that its x axis tick labels look like those of a line chart?
I made a time series with evenly spaced intervals (one item each day) and can plot it like such just fine:
intensity[350:450].plot()
plt.show()
But switching to a bar chart created this mess:
intensity[350:450].plot(kind = 'bar')
plt.show()
I then created a bar chart using matplotlib directly but it lacks the nice date time series tick label formatter of pandas:
def bar_chart(series):
fig, ax = plt.subplots(1)
ax.bar(series.index, series)
fig.autofmt_xdate()
plt.show()
bar_chart(intensity[350:450])
Here's an excerpt from the intensity Series:
intensity[390:400]
2017-03-07 3
2017-03-08 0
2017-03-09 3
2017-03-10 0
2017-03-11 0
2017-03-12 0
2017-03-13 2
2017-03-14 0
2017-03-15 3
2017-03-16 0
Freq: D, dtype: int64
I could go all out on this and just create the tick labels by hand completely but I'd rather not have to baby matplotlib and let do pandas its job and do what it did in the very first figure but with a bar plot. So how do I do that?
Pandas bar plots are categorical plots. They create one tick (+label) for each category. If the categories are dates and those dates are continuous one may aim at leaving certain dates out, e.g. to plot only every fifth category,
ax = series.plot(kind="bar")
ax.set_xticklabels([t if not i%5 else "" for i,t in enumerate(ax.get_xticklabels())])
In contrast, matplotlib bar charts are numberical plots. Here a useful ticker can be applied, which ticks the dates weekly, monthly or whatever is needed.
In addition, matplotlib allows to have full control over the tick positions and their labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
index = pd.date_range("2018-01-26", "2018-05-05")
series = pd.Series(np.random.rayleigh(size=100), index=index)
plt.bar(series.index, series.values)
plt.gca().xaxis.set_major_locator(dates.MonthLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter("%b\n%Y"))
plt.show()

Pandas series stacked bar chart normalized

I have a pandas series with a multiindex like this:
my_series.head(5)
datetime_publication my_category
2015-03-31 xxx 24
yyy 2
zzz 1
qqq 1
aaa 2
dtype: int64
I am generating a horizontal bar chart using the plot method from pandas with all those stacked categorical values divided by datetime (according to the index hierarchy) like this:
my_series.unstack(level=1).plot.barh(
stacked=True,
figsize=(16,6),
colormap='Paired',
xlim=(0,10300),
rot=45
)
plt.legend(
bbox_to_anchor=(0., 1.02, 1., .102),
loc=3,
ncol=5,
mode="expand",
borderaxespad=0.
)
However I am not able to find a way to normalize all those values in the series broken down by datetime_publication,my_category. I would like to have all the horizontal bars of the same length, but right now the legth depends on the absolute values in the series.
Is there a built-in functionality from pandas to normalize the slices of the series or some quick function to apply at the series that keeps track of the total taken from the multiindex combinatin of the levels?