Pandas time series plot - setting custom ticks - pandas

I am creating a general-purpose average_week aggregation and plot tool using pandas. Everything works fine (I'd be glad to receive comments on that, too), but the ticks: as I "fake" dates, I want to replace the whole set of ticks with the homebrewed (I already received some questions regarding January 1 on the timeline).
Yet, it seems that pandas overwrite all the ticks, no matter what I pass after. I was able to add ticks I want - yet I can't find how to erase pandas ones.
def averageWeek(df, ax, tcol='ts', ccol='id', label=None, treshold=0,
normalize=True, **kwargs):
'''calculate average week on ts'''
s = df[[tcol, ccol]].rename(columns={tcol:'ts',ccol:'id'}) # rename to convention
s = df[['id', 'ts']].set_index('ts').resample('15Min', how='count').reset_index()
s['id'] = s['id'].astype(float)
s['ts'] = s.ts.apply(lambda x: datetime.datetime(year=2015,month=1,
day=(x.weekday()+1),
hour=x.hour,
minute = x.minute))
s = s.groupby(['ts']).agg('mean')
if s.id.sum() >= treshold:
if normalize:
s = 1.0*s/s.sum()
else:
pass
if label:
s.rename(columns={'id':label}, inplace=1)
s.plot(ax=ax, legend=False, **kwargs);
else:
print name, 'didnt pass treshhold:', s[name].sum()
pass
return g
fig, ax = plt.subplots(figsize=(18,6))
aw = averageWeek(LMdata, ax=frame, label='Lower Manhattan', alpha=1, lw=1)
x = [datetime.datetime(year=2015, month=1, day=i) for i in range(1,8)]
labels = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
frame.axes.get_xaxis().set_ticks([])
plt.xlabel('Average week')
plt.legend()

Your problem is that there are actually two kinds of tick labels involved in this: major and minor ticklabels, at major and minor ticks. You want to clear both of them. For example, if ax is the axis in question, the following will work:
ax.set_xticklabels([],minor=False) # the default
ax.set_xticklabels([],minor=True)
You can then set the ticklabels and tick locations that you want.

Related

Iterating through a list of series for plotting

I have a list of series where each series is 100 observations of the time it took to parse some data.
list_series = [Series1, Series2, Series3...]
When I try to plot the series individually, I would do something like
# generate subplots
fig, ax = plt.subplots()
# histplot
sns.histplot(data=Series1, ax=ax, color='deepskyblue', kde=True, alpha=0.3)
# title
ax.set_title(f'Parsing times for Series1: Mean = {np.mean(Series1):0.4}, Median = {np.median(Series1):0.4}')
# fix overlap
fig.tight_layout()
simple enough, and the plot looks good.
However, when I put this process through a loop, using list_series, the plots end up becoming wonky using pretty much the same method except through a loop:
list_series = [Series1, Series2, Series3...]
for i in list_series:
fig, ax = plt.subplots()
sns.histplot(data=i, ax=ax, color='deepskyblue', kde=True, alpha=0.3)
ax.set_title(f'Parsing times for {i}: Mean = {np.mean(i):0.4}, Median = {np.median(i):0.4}')
fig.tight_layout()
The name of the Series in the title gets replaced with the index number, and the head and tail of the series gets squished above the actual plot...
(don't worry about mean/median not matching up, I did some order shuffling within the list during trials)
Is there something I'm not understanding as to how loops or plotting works?
Thank you.

Is it possible to break x and y axis at the same time on lineplot?

I am working on drawing lineplots with matplotlib.
I checked several posts and could understand how the line break works on matplotlib (Break // in x axis of matplotlib)
However, I was wondering is it possible to break x and y axis all together at the same time.
My current drawing looks like below.
As shown on the graph, x-axis [2000,5000] waste spaces a lot.
Because I have more data that need to be drawn after 7000, I want to save more space.
Is it possible to split x-axis together with y-axis?
Or is there another convenient way to not to show specific region on lineplot?
If there is another library enabling this, I am willing to drop matplotlib and adopt others...
Maybe splitting the axis isn't your best choice. I would perhaps try inserting another smaller figure into the open space of your large figure using add_axes(). Here is a small example.
t = np.linspace(0, 5000, 1000) # create 1000 time stamps
data = 5*t*np.exp(-t/100) # and some fake data
fig, ax = plt.subplots()
ax.plot(t, data)
box = ax.get_position()
width = box.width*0.6
height = box.height*0.6
x = 0.35
y = 0.35
subax = fig.add_axes([x,y,width,height])
subax.plot(t, data)
subax.axis([0, np.max(t)/10, 0, np.max(data)*1.1])
plt.show()

Adding grouping ticks to a bar chart

I have a chart created from a pandas DataFrame that looks like this:
I've formatted the ticks with:
ax = df.plot(kind='bar')
ax.set_xticklabels(df.index.strftime('%I %p'))
However, I'd like to add a second set of larger ticks, to achieve this kind of effect:
I've tried many variations of use set_major_locator and set_major_formatter (as well as combining major and minor formatter), but it seems I'm not approaching it correctly and I wasn't able to find useful examples of similar combined ticks online either.
Does someone have a suggestion on how to achieve something similar to the bottom image?
The dataframe has a datetime index and is binned data, from something like df.resample(bin_size, label='right', closed='right').sum())
One idea is to set major ticks to display the date (%-d-%b) at noon each day with some padding (e.g., pad=40). This will leave a minor tick gap at noon, so for consistency you could set minor ticks only on the odd hours and give them rotation=90.
Note that this uses matplotlib's bar() since pandas' plot.bar() doesn't play well with the date formatting.
import matplotlib.dates as mdates
# toy data
dates = pd.date_range('2021-08-07', '2021-08-10', freq='1H')
df = pd.DataFrame({'date': dates, 'value': np.random.randint(10, size=len(dates))}).set_index('date')
# pyplot bar instead of pandas bar
fig, ax = plt.subplots(figsize=(14, 4))
ax.bar(df.index, df.value, width=0.02)
# put day labels at noon
ax.xaxis.set_major_locator(mdates.HourLocator(byhour=[12]))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%-d-%b'))
ax.xaxis.set_tick_params(which='major', pad=40)
# put hour labels on odd hours
ax.xaxis.set_minor_locator(mdates.HourLocator(byhour=range(1, 25, 2)))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%-I %p'))
ax.xaxis.set_tick_params(which='minor', pad=0, rotation=90)
# add day separators at every midnight tick
ticks = df[df.index.strftime('%H:%M:%S') == '00:00:00'].index
arrowprops = dict(width=2, headwidth=1, headlength=1, shrink=0.02)
for tick in ticks:
xy = (mdates.date2num(tick), 0) # convert date index to float coordinate
xytext = (0, -65) # draw downward 65 points
ax.annotate('', xy=xy, xytext=xytext, textcoords='offset points',
annotation_clip=False, arrowprops=arrowprops)

Pandas: How can I plot with separate y-axis, but still control the order?

I am trying to plot multiple time series in one plot. The scales are different, so they need separate y-axis, and I want a specific time series to have its y-axis on the right. I also want that time series to be behind the others. But I find that when I use secondary_y=True, this time series is always brought to the front, even if the code to plot it comes before the others. How can I control the order of the plots when using secondary_y=True (or is there an alternative)?
Furthermore, when I use secondary_y=True the y-axis on the left no longer adapts to appropriate values. Is there a fixed for this?
# imports
import numpy as np
import matplotlib.pyplot as plt
# dummy data
lenx = 1000
x = range(lenx)
np.random.seed(4)
y1 = np.random.randn(lenx)
y1 = pd.Series(y1, index=x)
y2 = 50.0 + y1.cumsum()
# plot time series.
# use ax to make Pandas plot them in the same plot.
ax = y2.plot.area(secondary_y=True)
y1.plot(ax=ax)
So what I would like is to have the blue area plot behind the green time series, and to have the left y-axis take appropriate values for the green time series:
https://i.stack.imgur.com/6QzPV.png
Perhaps something like the following using matplotlib.axes.Axes.twinx instead of using secondary_y, and then following the approach in this answer to move the twinned axis to the background:
# plot time series.
fig, ax = plt.subplots()
y1.plot(ax=ax, color='green')
ax.set_zorder(10)
ax.patch.set_visible(False)
ax1 = ax.twinx()
y2.plot.area(ax=ax1, color='blue')

How to display all the lables present on x and y axis in matplotlib [duplicate]

I'm playing around with the abalone dataset from UCI's machine learning repository. I want to display a correlation heatmap using matplotlib and imshow.
The first time I tried it, it worked fine. All the numeric variables plotted and labeled, seen here:
fig = plt.figure(figsize=(15,8))
ax1 = fig.add_subplot(111)
plt.imshow(df.corr(), cmap='hot', interpolation='nearest')
plt.colorbar()
labels = df.columns.tolist()
ax1.set_xticklabels(labels,rotation=90, fontsize=10)
ax1.set_yticklabels(labels,fontsize=10)
plt.show()
successful heatmap
Later, I used get_dummies() on my categorical variable, like so:
df = pd.get_dummies(df, columns = ['sex'])
resulting correlation matrix
So, if I reuse the code from before to generate a nice heatmap, it should be fine, right? Wrong!
What dumpster fire is this?
So my question is, where did my labels go, and how do I get them back?!
Thanks!
To get your labels back, you can force matplotlib to use enough xticks so that all your labels can be shown. This can be done by adding
ax1.set_xticks(np.arange(len(labels)))
ax1.set_yticks(np.arange(len(labels)))
before your statements ax1.set_xticklabels(labels,rotation=90, fontsize=10) and ax1.set_yticklabels(labels,fontsize=10).
This results in the following plot: