Adding grouping ticks to a bar chart - pandas

I have a chart created from a pandas DataFrame that looks like this:
I've formatted the ticks with:
ax = df.plot(kind='bar')
ax.set_xticklabels(df.index.strftime('%I %p'))
However, I'd like to add a second set of larger ticks, to achieve this kind of effect:
I've tried many variations of use set_major_locator and set_major_formatter (as well as combining major and minor formatter), but it seems I'm not approaching it correctly and I wasn't able to find useful examples of similar combined ticks online either.
Does someone have a suggestion on how to achieve something similar to the bottom image?
The dataframe has a datetime index and is binned data, from something like df.resample(bin_size, label='right', closed='right').sum())

One idea is to set major ticks to display the date (%-d-%b) at noon each day with some padding (e.g., pad=40). This will leave a minor tick gap at noon, so for consistency you could set minor ticks only on the odd hours and give them rotation=90.
Note that this uses matplotlib's bar() since pandas' plot.bar() doesn't play well with the date formatting.
import matplotlib.dates as mdates
# toy data
dates = pd.date_range('2021-08-07', '2021-08-10', freq='1H')
df = pd.DataFrame({'date': dates, 'value': np.random.randint(10, size=len(dates))}).set_index('date')
# pyplot bar instead of pandas bar
fig, ax = plt.subplots(figsize=(14, 4))
ax.bar(df.index, df.value, width=0.02)
# put day labels at noon
ax.xaxis.set_major_locator(mdates.HourLocator(byhour=[12]))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%-d-%b'))
ax.xaxis.set_tick_params(which='major', pad=40)
# put hour labels on odd hours
ax.xaxis.set_minor_locator(mdates.HourLocator(byhour=range(1, 25, 2)))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%-I %p'))
ax.xaxis.set_tick_params(which='minor', pad=0, rotation=90)
# add day separators at every midnight tick
ticks = df[df.index.strftime('%H:%M:%S') == '00:00:00'].index
arrowprops = dict(width=2, headwidth=1, headlength=1, shrink=0.02)
for tick in ticks:
xy = (mdates.date2num(tick), 0) # convert date index to float coordinate
xytext = (0, -65) # draw downward 65 points
ax.annotate('', xy=xy, xytext=xytext, textcoords='offset points',
annotation_clip=False, arrowprops=arrowprops)

Related

How to show min and max values at the end of the axes

I generate plots like below:
from pylab import *
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
import matplotlib.ticker as ticker
rcParams['axes.linewidth'] = 2 # set the value globally
rcParams['font.size'] = 16# set the value globally
rcParams['font.family'] = ['DejaVu Sans']
rcParams['mathtext.fontset'] = 'stix'
rcParams['legend.fontsize'] = 24
rcParams['axes.prop_cycle'] = cycler(color=['grey','b','g','r','orange'])
rc('lines', linewidth=2, linestyle='-',marker='o')
rcParams['axes.xmargin'] = 0
rcParams['axes.ymargin'] = 0
t = arange(0,21,1)
v = 2.0
s = v*t
plt.figure(figsize=(12, 4))
plt.plot(t,s,label='$s=%1.1f\cdot t$'%v)
plt.title('Wykres drogi w czasie $s=v\cdot t$')
plt.xlabel('Czas $t$, s')
plt.ylabel('Droga $s$, m')
plt.autoscale(enable=True, axis='both', tight=None)
legend(loc='best')
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
plt.grid()
plt.show()
When I am changing the value t = arange(0,21,1) for example to t = arange(0,20,1) which gives me for example on the x axis max value= 19.0 my max value dispirs from the x axis. The same situation is of course with y axis.
My question is how to force matplotlib to produce always plots where on the axes are max values just at the end of the axes like should be always for my purposes or should be possible to chose like an option?
Imiage from my program in Fortan I did some years ago
Matplotlib is more efficiens that I use it but there should be an opition like that (the picture above).
In this way I can always observe max min in text windows or do take addiional steps to make sure about max min values. I would like to read them from axes and the question is ...Are there such possibilites in mathplotlib ??? If not I will close the post.
Axes I am thinking about more or less
I see two ways to solve the problem.
Set the axes automatic limit mode to round numbers
In the rcParams you can do this with
rcParams['axes.autolimit_mode'] = 'round_numbers'
And turn off the manual axes limits with min and max
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
This will produce the image below. Still, the extreme values of the axes are shown at the nearest "round numbers", but the user can approximately catch the data range limits. If you need the exact value to be displayed, you can see the second solution which cannot be directly used from the rcParams.
or – Manually generate axes ticks
This solution implies explicitly asking for a given number of ticks. I guess there is a way to automatize it depending on the axes size etc. But if you are dealing with more or less every time the same graph size, you can decide a fixed number of ticks manually. This can be done with
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
plt.xticks(np.linspace(t.min(), t.max(), 7)) # arbitrary chosen
plt.yticks(np.linspace(s.min(), s.max(), 5)) # arbitrary chosen
generated the image below, quite similar to your image example.

Changed frequency of ticks in Pandas '.bar' plot, but messed up the actual bars

how's your self-isolation going on?
Mine rocks, as I'm drilling through visualization in Python. Recently, however, I've ran into an issue.
I figured that .plot.bar() in Pandas has an uncommon formatting of x-axis (which kinda confirms that I read before I ask). I had price data with monthly frequency, so I applied a fix to display only yearly ticks in a bar chart:
fig, ax = plt.subplots()
ax.bar(btc_returns.index, btc_returns)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Where btc_returns is a Series object with datetime in index.
The output I got was weird. Here are the screenshots of what I expected vs the end result.
I tried to find a solution to this, but no luck. Can you guys please give me a hand? Thanks! Criticism is welcome as always :)
And my solution is like this:
fig, ax = plt.subplots(figsize=(15,7))
ax.bar(btc_returns.index, btc_returns.returns.values, width = 1)
Where btc_returns is a DataFrame with the returns of BTC. I figured that .values makes the bar plot read the datetime input correctly. For the 'missing' bars - their resolution was just way too small, so I set the width to '1'.
Using the stock value data from Yahoo Finance: Bitcoin USD
Technically, you can do pd.to_datetime(btc.Date).dt.date at the beginning, but resample won't work, which is why btc_monthly.index.date is done as a second step.
resample can happen over different periods (e.g. 2M = every two months)
Load and transform the data
import pandas as pd
import matplotlib.pyplot as plt
# load data
btc = pd.read_csv('data/BTC-USD.csv')
# Date to datetime
btc.Date = pd.to_datetime(btc.Date)
# calculate daily return %
btc['return'] = ((btc.Close - btc.Close.shift(1))/btc.Close.shift(1))*100
# resample to monthly and aggregate by sum
btc_monthly = btc.resample('M', on='Date').sum()
# set the index to be date only (no time)
btc_monthly.index = btc_monthly.index.date
Plot
btc_monthly.plot(y='return', kind='bar', figsize=(15, 8))
plt.show()
Plot Bimonthly
btc_monthly = btc.resample('2M', on='Date').sum() # instead of 'M'
btc_monthly.index = btc_monthly.index.date
btc_monthly.plot(y='return', kind='bar', figsize=(15, 8), legend=False)
plt.title('Bitcoin USD: Bimonthly % Return')
plt.ylabel('% return')
plt.xlabel('Date')
plt.show()

Pandas: How can I plot with separate y-axis, but still control the order?

I am trying to plot multiple time series in one plot. The scales are different, so they need separate y-axis, and I want a specific time series to have its y-axis on the right. I also want that time series to be behind the others. But I find that when I use secondary_y=True, this time series is always brought to the front, even if the code to plot it comes before the others. How can I control the order of the plots when using secondary_y=True (or is there an alternative)?
Furthermore, when I use secondary_y=True the y-axis on the left no longer adapts to appropriate values. Is there a fixed for this?
# imports
import numpy as np
import matplotlib.pyplot as plt
# dummy data
lenx = 1000
x = range(lenx)
np.random.seed(4)
y1 = np.random.randn(lenx)
y1 = pd.Series(y1, index=x)
y2 = 50.0 + y1.cumsum()
# plot time series.
# use ax to make Pandas plot them in the same plot.
ax = y2.plot.area(secondary_y=True)
y1.plot(ax=ax)
So what I would like is to have the blue area plot behind the green time series, and to have the left y-axis take appropriate values for the green time series:
https://i.stack.imgur.com/6QzPV.png
Perhaps something like the following using matplotlib.axes.Axes.twinx instead of using secondary_y, and then following the approach in this answer to move the twinned axis to the background:
# plot time series.
fig, ax = plt.subplots()
y1.plot(ax=ax, color='green')
ax.set_zorder(10)
ax.patch.set_visible(False)
ax1 = ax.twinx()
y2.plot.area(ax=ax1, color='blue')

Pandas time series plot - setting custom ticks

I am creating a general-purpose average_week aggregation and plot tool using pandas. Everything works fine (I'd be glad to receive comments on that, too), but the ticks: as I "fake" dates, I want to replace the whole set of ticks with the homebrewed (I already received some questions regarding January 1 on the timeline).
Yet, it seems that pandas overwrite all the ticks, no matter what I pass after. I was able to add ticks I want - yet I can't find how to erase pandas ones.
def averageWeek(df, ax, tcol='ts', ccol='id', label=None, treshold=0,
normalize=True, **kwargs):
'''calculate average week on ts'''
s = df[[tcol, ccol]].rename(columns={tcol:'ts',ccol:'id'}) # rename to convention
s = df[['id', 'ts']].set_index('ts').resample('15Min', how='count').reset_index()
s['id'] = s['id'].astype(float)
s['ts'] = s.ts.apply(lambda x: datetime.datetime(year=2015,month=1,
day=(x.weekday()+1),
hour=x.hour,
minute = x.minute))
s = s.groupby(['ts']).agg('mean')
if s.id.sum() >= treshold:
if normalize:
s = 1.0*s/s.sum()
else:
pass
if label:
s.rename(columns={'id':label}, inplace=1)
s.plot(ax=ax, legend=False, **kwargs);
else:
print name, 'didnt pass treshhold:', s[name].sum()
pass
return g
fig, ax = plt.subplots(figsize=(18,6))
aw = averageWeek(LMdata, ax=frame, label='Lower Manhattan', alpha=1, lw=1)
x = [datetime.datetime(year=2015, month=1, day=i) for i in range(1,8)]
labels = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
frame.axes.get_xaxis().set_ticks([])
plt.xlabel('Average week')
plt.legend()
Your problem is that there are actually two kinds of tick labels involved in this: major and minor ticklabels, at major and minor ticks. You want to clear both of them. For example, if ax is the axis in question, the following will work:
ax.set_xticklabels([],minor=False) # the default
ax.set_xticklabels([],minor=True)
You can then set the ticklabels and tick locations that you want.

Reducing the distance between two boxplots

I'm drawing the bloxplot shown below using python and matplotlib. Is there any way I can reduce the distance between the two boxplots on the X axis?
This is the code that I'm using to get the figure above:
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['ytick.direction'] = 'out'
rcParams['xtick.direction'] = 'out'
fig = plt.figure()
xlabels = ["CG", "EG"]
ax = fig.add_subplot(111)
ax.boxplot([values_cg, values_eg])
ax.set_xticks(np.arange(len(xlabels))+1)
ax.set_xticklabels(xlabels, rotation=45, ha='right')
fig.subplots_adjust(bottom=0.3)
ylabels = yticks = np.linspace(0, 20, 5)
ax.set_yticks(yticks)
ax.set_yticklabels(ylabels)
ax.tick_params(axis='x', pad=10)
ax.tick_params(axis='y', pad=10)
plt.savefig(os.path.join(output_dir, "output.pdf"))
And this is an example closer to what I'd like to get visually (although I wouldn't mind if the boxplots were even a bit closer to each other):
You can either change the aspect ratio of plot or use the widths kwarg (doc) as such:
ax.boxplot([values_cg, values_eg], widths=1)
to make the boxes wider.
Try changing the aspect ratio using
ax.set_aspect(1.5) # or some other float
The larger then number, the narrower (and taller) the plot should be:
a circle will be stretched such that the height is num times the width. aspect=1 is the same as aspect=’equal’.
http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.set_aspect
When your code writes:
ax.set_xticks(np.arange(len(xlabels))+1)
You're putting the first box plot on 0 and the second one on 1 (event though you change the tick labels afterwards), just like in the second, "wanted" example you gave they are set on 1,2,3.
So i think an alternative solution would be to play with the xticks position and the xlim of the plot.
for example using
ax.set_xlim(-1.5,2.5)
would place them closer.
positions : array-like, optional
Sets the positions of the boxes. The ticks and limits are automatically set to match the positions. Defaults to range(1, N+1) where N is the number of boxes to be drawn.
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.boxplot.html
This should do the job!
As #Stevie mentioned, you can use the positions kwarg (doc) to manually set the x-coordinates of the boxes:
ax.boxplot([values_cg, values_eg], positions=[1, 1.3])