Iterating through a list of series for plotting - pandas

I have a list of series where each series is 100 observations of the time it took to parse some data.
list_series = [Series1, Series2, Series3...]
When I try to plot the series individually, I would do something like
# generate subplots
fig, ax = plt.subplots()
# histplot
sns.histplot(data=Series1, ax=ax, color='deepskyblue', kde=True, alpha=0.3)
# title
ax.set_title(f'Parsing times for Series1: Mean = {np.mean(Series1):0.4}, Median = {np.median(Series1):0.4}')
# fix overlap
fig.tight_layout()
simple enough, and the plot looks good.
However, when I put this process through a loop, using list_series, the plots end up becoming wonky using pretty much the same method except through a loop:
list_series = [Series1, Series2, Series3...]
for i in list_series:
fig, ax = plt.subplots()
sns.histplot(data=i, ax=ax, color='deepskyblue', kde=True, alpha=0.3)
ax.set_title(f'Parsing times for {i}: Mean = {np.mean(i):0.4}, Median = {np.median(i):0.4}')
fig.tight_layout()
The name of the Series in the title gets replaced with the index number, and the head and tail of the series gets squished above the actual plot...
(don't worry about mean/median not matching up, I did some order shuffling within the list during trials)
Is there something I'm not understanding as to how loops or plotting works?
Thank you.

Related

Pandas: How can I plot with separate y-axis, but still control the order?

I am trying to plot multiple time series in one plot. The scales are different, so they need separate y-axis, and I want a specific time series to have its y-axis on the right. I also want that time series to be behind the others. But I find that when I use secondary_y=True, this time series is always brought to the front, even if the code to plot it comes before the others. How can I control the order of the plots when using secondary_y=True (or is there an alternative)?
Furthermore, when I use secondary_y=True the y-axis on the left no longer adapts to appropriate values. Is there a fixed for this?
# imports
import numpy as np
import matplotlib.pyplot as plt
# dummy data
lenx = 1000
x = range(lenx)
np.random.seed(4)
y1 = np.random.randn(lenx)
y1 = pd.Series(y1, index=x)
y2 = 50.0 + y1.cumsum()
# plot time series.
# use ax to make Pandas plot them in the same plot.
ax = y2.plot.area(secondary_y=True)
y1.plot(ax=ax)
So what I would like is to have the blue area plot behind the green time series, and to have the left y-axis take appropriate values for the green time series:
https://i.stack.imgur.com/6QzPV.png
Perhaps something like the following using matplotlib.axes.Axes.twinx instead of using secondary_y, and then following the approach in this answer to move the twinned axis to the background:
# plot time series.
fig, ax = plt.subplots()
y1.plot(ax=ax, color='green')
ax.set_zorder(10)
ax.patch.set_visible(False)
ax1 = ax.twinx()
y2.plot.area(ax=ax1, color='blue')

How to display all the lables present on x and y axis in matplotlib [duplicate]

I'm playing around with the abalone dataset from UCI's machine learning repository. I want to display a correlation heatmap using matplotlib and imshow.
The first time I tried it, it worked fine. All the numeric variables plotted and labeled, seen here:
fig = plt.figure(figsize=(15,8))
ax1 = fig.add_subplot(111)
plt.imshow(df.corr(), cmap='hot', interpolation='nearest')
plt.colorbar()
labels = df.columns.tolist()
ax1.set_xticklabels(labels,rotation=90, fontsize=10)
ax1.set_yticklabels(labels,fontsize=10)
plt.show()
successful heatmap
Later, I used get_dummies() on my categorical variable, like so:
df = pd.get_dummies(df, columns = ['sex'])
resulting correlation matrix
So, if I reuse the code from before to generate a nice heatmap, it should be fine, right? Wrong!
What dumpster fire is this?
So my question is, where did my labels go, and how do I get them back?!
Thanks!
To get your labels back, you can force matplotlib to use enough xticks so that all your labels can be shown. This can be done by adding
ax1.set_xticks(np.arange(len(labels)))
ax1.set_yticks(np.arange(len(labels)))
before your statements ax1.set_xticklabels(labels,rotation=90, fontsize=10) and ax1.set_yticklabels(labels,fontsize=10).
This results in the following plot:

Pandas time series plot - setting custom ticks

I am creating a general-purpose average_week aggregation and plot tool using pandas. Everything works fine (I'd be glad to receive comments on that, too), but the ticks: as I "fake" dates, I want to replace the whole set of ticks with the homebrewed (I already received some questions regarding January 1 on the timeline).
Yet, it seems that pandas overwrite all the ticks, no matter what I pass after. I was able to add ticks I want - yet I can't find how to erase pandas ones.
def averageWeek(df, ax, tcol='ts', ccol='id', label=None, treshold=0,
normalize=True, **kwargs):
'''calculate average week on ts'''
s = df[[tcol, ccol]].rename(columns={tcol:'ts',ccol:'id'}) # rename to convention
s = df[['id', 'ts']].set_index('ts').resample('15Min', how='count').reset_index()
s['id'] = s['id'].astype(float)
s['ts'] = s.ts.apply(lambda x: datetime.datetime(year=2015,month=1,
day=(x.weekday()+1),
hour=x.hour,
minute = x.minute))
s = s.groupby(['ts']).agg('mean')
if s.id.sum() >= treshold:
if normalize:
s = 1.0*s/s.sum()
else:
pass
if label:
s.rename(columns={'id':label}, inplace=1)
s.plot(ax=ax, legend=False, **kwargs);
else:
print name, 'didnt pass treshhold:', s[name].sum()
pass
return g
fig, ax = plt.subplots(figsize=(18,6))
aw = averageWeek(LMdata, ax=frame, label='Lower Manhattan', alpha=1, lw=1)
x = [datetime.datetime(year=2015, month=1, day=i) for i in range(1,8)]
labels = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
frame.axes.get_xaxis().set_ticks([])
plt.xlabel('Average week')
plt.legend()
Your problem is that there are actually two kinds of tick labels involved in this: major and minor ticklabels, at major and minor ticks. You want to clear both of them. For example, if ax is the axis in question, the following will work:
ax.set_xticklabels([],minor=False) # the default
ax.set_xticklabels([],minor=True)
You can then set the ticklabels and tick locations that you want.

Matplotlib Subplots -- Get Rid of Tick Labels Altogether

Is there a way to get rid of tick labels altogether when creating an array of subplots in Matplotlib? I am currently needing to specify each plot based on the row and column of a larger data set to which the plot corresponds. I've attempted to use the ax.set_xticks([]) and the similar y-axis command, to no avail.
I recognize that it's probably an unusual request to want to make a plot with no axis data whatsoever, but that's what I need. And I need it to automatically apply to all of the subplots in the array.
You have the right method. Maybe you are not applying the set_xticks to the correct axes.
An example:
import matplotlib.pyplot as plt
import numpy as np
ncols = 5
nrows = 3
# create the plots
fig = plt.figure()
axes = [ fig.add_subplot(nrows, ncols, r * ncols + c) for r in range(0, nrows) for c in range(0, ncols) ]
# add some data
for ax in axes:
ax.plot(np.random.random(10), np.random.random(10), '.')
# remove the x and y ticks
for ax in axes:
ax.set_xticks([])
ax.set_yticks([])
This gives:
Note that each axis instance is stored in a list (axes) and then they can be easily manipulated. As usual, there are several ways of doing this, this is just an example.
Even more concise than #DrV 's answer, remixing #mwaskom's comment, a complete and total one-liner to get rid of all axes in all subplots:
# do some plotting...
plt.subplot(121),plt.imshow(image1)
plt.subplot(122),plt.imshow(image2)
# ....
# one liner to remove *all axes in all subplots*
plt.setp(plt.gcf().get_axes(), xticks=[], yticks=[]);
Note: this must be called before any calls to plt.show()
The commands are the same for subplots
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.plot([1,2])
ax1.tick_params(
axis='x', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
bottom='off', # ticks along the bottom edge are off
top='off', # ticks along the top edge are off
labelbottom='off' # labels along the bottom edge are off)
)
plt.draw()
You can get rid of the default subplot x and y ticks with simply running the following codes:
fig, ax = plt.subplots()
ax.xaxis.set_major_locator(plt.NullLocator())
ax.yaxis.set_major_locator(plt.NullLocator())
for i in range(3):
ax = fig.add_subplot(3, 1, i+1)
...
Just by adding the 2 aforementioned lines just after fig, ax = plt.subplots() you can remove the default ticks.
One can remove the xticks or yticks by
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
If you want to turn off also the spines, so having no axis at all, you can use:
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
And if you want to turn everything off at once, use:
ax.axis("off")

How to draw an errorplot and a boxplot sharing x and y axes

I am used to read programming documentation but I have to admit that when it comes to matplotlib, I get really lost & confused. I just want to draw 2 sets of data that share the same y and x Axes, but one set drawn as a boxplot, and the other as errorbars. I've tried setting hold to true or cloning the X axes but every time only one of the dataset gets drawn.
Could someone share a simple code I could mimic ?
here is what I basically do
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
ax = ax1.twinx()
ax.errorbar( ... )
ax = ax1.twinx()
ax.boxplot(...)
plt.show()
My question is really similar to that one add boxplot to other graph in python which answer doesn't work.
Best regards
You can absolutely do this, and you don't need to do anything special like make multiple x or y axes; since you want to plot them both on the same set of axes, you don't have to change anything.
One thing that you need to keep in mind is that the x-axis of a boxplot is range(1, num_boxes + 1), which might not be what you expect.
Here's an example using random data.
x = np.arange(4)
y = np.random.randn(20, 4)
plt.boxplot(y)
plt.errorbar(x, np.mean(y, axis=0), yerr=np.std(y, axis=0))
It may be difficult to see that this is working, but if you offset the x values you'll see that it is drawing error bars.
plt.boxplot(y)
plt.errorbar(x + 0.5, np.mean(y, axis=0), yerr=np.std(y, axis=0))
Finally, you can add 1 to x to get what you are probably wanting.
plt.boxplot(y)
plt.errorbar(x + 1, np.mean(y, axis=0), yerr=np.std(y, axis=0))
Not sure why you are calling twinx every time, I think you just want to do:
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.errorbar( ... )
ax.boxplot(...)
plt.show()