Shift Matplotlib Axes to Match Overlaid Plots - pandas

I have a plot that is an overlay of a boxplot and a scatter plot (red data points). Everything is fine except that the red data points are not lining up on the x axis with the boxplot x axis. I think this is because two different plot methods are used on the same axis?? I have noticed that the second method plot of "scatter" effectively shifts the boxplots to the right on the x axis. Plotting the boxplot without the scatter plot does not shift the x axis values as shown below. The method i'm using below should work but it is not. Here is my code:
sitenames = df3.plant_name.unique().tolist()
months = ['JANUARY','FEBRUARY','MARCH','APRIL','MAY','JUNE','JULY','AUGUST','SEPTEMBER','OCTOBER','NOVEMBER','DECEMBER']
from datetime import datetime
monthn = datetime.now().month
newList = list()
for i in range(monthn-1):
newList.append(months[(monthn-1+i)%12])
print(newList)
for i, month in enumerate(newList,1):
#plt.figure()
fig, ax = plt.subplots()
ax = df3[df3['month']==i].boxplot(by='plant_name',column='Var')
df3c[df3c['month']==i].plot(kind='scatter', x='plant_name', y='Var',ax=ax,color='r',label='CY')
plt.xticks(rotation=90, ha='right')
plt.suptitle('1991-2020 ERA5 WIND PRODUCTION',y=1)
plt.title(months[i-1])
plt.xlabel('SITE')
plt.ylabel('VARIABILITY')
plt.legend()
plt.show()
Here is the plot from "February" that shows the mis-aligned x axis:
Here are partial rows for df3, df3c:
df3.head(3)
Out[223]:
plant_name month year power_kwh power_kwh_rhs Var
0 BII NEE STIPA 1 1991 11905.826075 14673.281223 -18.9
1 BII NEE STIPA 1 1992 14273.927688 14673.281223 -2.7
2 BII NEE STIPA 1 1993 12559.828360 14673.281223 -14.4
df3c.head(3)
Out[224]:
plant_name month year power_kwh power_kwh_rhs Var
0 BII NEE STIPA 1 2021.0 14863.643952 14673.281223 1.3
1 BII NEE STIPA 2 2021.0 9663.393155 12388.328084 -22.0
2 DOS ARBOLITOS 1 2021.0 36819.502285 36882.205762 -0.2
I have found a similar problem but can't see how to insert this solution to my code: Shift matplotlib axes to match eachother

Related

Matplotlib Plot X, Y Line Plot Multiple Columns Fixed X Axis

I'm trying to plot a df with the x axis forced to 12, 1, 2 for (Dec, Jan, Feb) and I cannot see how to do this. Matplot keeps wanting to plot the x axis in the 1,2,12 order. My DF (analogs_re) partial columns for the example looks like this:
Month 2000 1995 2009 2014 1994 2003
0 12 -0.203835 0.580590 0.233124 0.490193 0.605808 0.016756
1 1 -0.947029 -1.239794 -0.977004 0.207236 0.436458 -0.501948
2 2 -0.059957 0.708626 0.111840 0.422534 1.051873 -0.149000
I need the y data plotted with x axis in 12, 1, 2 order as shown in the 'Month" column.
My code:
fig, ax = plt.subplots()
#for name, group in groups:
analogs_re.set_index('Month').plot(figsize=(10,5),grid=True)
analogs_re.plot(x='Month', y=analogs_re.columns[1:len(analogs_re.columns)])
When you set Month as the x-axis then obviously it's going to plot it in numerical order (0, 1, 2, 3...), because a sequential series does not start with 12, then 1, then 2, ...
The trick is to use the original index as x-axis, then label those ticks using the month number:
fig, ax = plt.subplots()
analogs_re.drop(columns='Month').plot(figsize=(10,5), grid=True, ax=ax)
ax.set_xticks(analogs_re.index)
ax.set_xticklabels(analogs_re["Month"])

Overlaying boxplots on the relative bin of a histogram

Taking the dataset 'tip' as an example
total_bill
tip
smoker
day
time
size
16.99
1.01
No
Sun
Dinner
2
10.34
1.66
No
Sun
Dinner
3
21.01
3.50
No
Sun
Dinner
3
23.68
3.31
No
Sun
Dinner
2
24.59
3.61
No
Sun
Dinner
4
what I'm trying to do is represent the distribution of the variable 'total_bill' and relate each of its bins to the distribution of the variable 'tip' linked to it. In this example, this graph is meant to answer the question: "What is the distribution of tips left by customers as a function of the bill they paid?"
I have more or less achieved the graph I wanted to obtain (but there is a problem. At the end I explain what it is).
And the procedure I adopted is this:
Dividing 'total_bill' into bins.
tips['bins_total_bill'] = pd.cut(tips.total_bill, 10)
tips.head()
total_bill
tip
smoker
day
time
size
bins_total_bill
16.99
1.01
No
Sun
Dinner
2
(12.618, 17.392]
10.34
1.66
No
Sun
Dinner
3
(7.844, 12.618]
21.01
3.50
No
Sun
Dinner
3
(17.392, 22.166]
23.68
3.31
No
Sun
Dinner
2
(22.166, 26.94]
24.59
3.61
No
Sun
Dinner
4
(22.166, 26.94]
Creation of a pd.Series with:
Index: pd.interval of total_cost bins
Values: n° of occurrences
s = tips['bins_total_bill'].value_counts(sort=False)
s
(3.022, 7.844] 7
(7.844, 12.618] 42
(12.618, 17.392] 68
(17.392, 22.166] 51
(22.166, 26.94] 31
(26.94, 31.714] 19
(31.714, 36.488] 12
(36.488, 41.262] 7
(41.262, 46.036] 3
(46.036, 50.81] 4
Name: bins_total_bill, dtype: int64
Combine barplot and poxplot together
fig, ax1 = plt.subplots(dpi=200)
ax2 = ax1.twinx()
sns.barplot(ax=ax1, x = s.index, y = s.values)
sns.boxplot(ax=ax2, x='bins_total_bill', y='tip', data=tips)
sns.stripplot(ax=ax2, x='bins_total_bill', y='tip', data=tips, size=5, color="yellow", edgecolor='red', linewidth=0.3)
#Title and axis labels
ax1.tick_params(axis='x', rotation=90)
ax1.set_ylabel('Number of bills')
ax2.set_ylabel('Tips [$]')
ax1.set_xlabel("Mid value of total_bill bins [$]")
ax1.set_title("Tips ~ Total_bill distribution")
#Reference lines average(tip) + add yticks + Legend
avg_tip = np.mean(tips.tip)
ax2.axhline(y=avg_tip, color='red', linestyle="--", label="avg tip")
ax2.set_yticks(list(ax2.get_yticks() + avg_tip))
ax2.legend(loc='best')
#Set labels axis x
ax1.set_xticklabels(list(map(lambda s: round(s.mid,2), s.index)))
It has to be said that this graph has a problem! As the x-axis is categorical, I cannot, for example, add a vertical line at the mean value of 'total_bill'.
How can I fix this to get the correct result?
I also wonder if there is a correct and more streamlined approach than the one I have adopted.
I thought of this method, which is more compact than the previous one (it can probably be done better) and overcomes the problem of scaling on the x-axis.
I split 'total_bill' into bins and add the column to Df
tips['bins_total_bill'] = pd.cut(tips.total_bill, 10)
Group column 'tip' by previously created bins
obj_gby_tips = tips.groupby('bins_total_bill')['tip']
gby_tip = dict(list(obj_gby_tips))
Create dictionary with:
keys: midpoint of each bins interval
values: gby tips for each interval
mid_total_bill_bins = list(map(lambda bins: bins.mid, list(gby_tip.keys())))
gby_tips = gby_tip.values()
tip_gby_total_bill_bins = dict(zip(mid_total_bill_bins, gby_tips))
Create chart by passing to each rectangle of the boxplot the
centroid of each respective bins
fig, ax1 = plt.subplots(dpi=200)
ax2 = ax1.twinx()
bp_values = list(tip_gby_total_bill_bins.values())
bp_pos = list(tip_gby_total_bill_bins.keys())
l1 = sns.histplot(tips.total_bill, bins=10, ax=ax1)
l2 = ax2.boxplot(bp_values, positions=bp_pos, manage_ticks=False, patch_artist=True, widths=2)
#Average tips as hline
avg_tip = np.mean(tips.tip)
ax2.axhline(y=avg_tip, color='red', linestyle="--", label="avg tip")
ax2.set_yticks(list(ax2.get_yticks() + avg_tip)) #add value of avg(tip) to y-axis
#Average total_bill as vline
avg_total_bill=np.mean(tips.total_bill)
ax1.axvline(x=avg_total_bill, color='orange', linestyle="--", label="avg tot_bill")
then the result.

How to add a legend to a figure

I would like to add a legend into my figure. This is the code i used to print the plot:
fig1, ax1 = plt.subplots()
ax1.legend("Test Legend")
ax1.plot(singleColumn)
This is the plot i get:
fig1 plot
If i use this code:
print(singleColumn.plot(legend=True))
Perfect Plot
I will get a perfect plot with a legend, but if i plot other figures, all plots are mixed up, so i would prefer the fig method.
How i add the Name of the Data into the fig1 plot ?
Best regards
Kai
edit:
The sinmgleColumn looks like that, i displayed the head:
[5 rows x 13 columns]
MESSDATUM
2011-01-10 2.40
2011-02-02 2.17
2011-03-03 2.32
2011-04-06 1.67
2011-05-04 2.56
Name: 2433876, dtype: float64

Pandas df histo, format my x ticker and include empty

I got this pandas df:
index TIME
12:07 2019-06-03 12:07:28
10:04 2019-06-04 10:04:25
11:14 2019-06-09 11:14:25
...
I use this command to do an histogram to plot how much occurence for each 15min periods
df['TIME'].groupby([df["TIME"].dt.hour, df["TIME"].dt.minute]).count().plot(kind="bar")
my plot look like this:
How can I get x tick like 10:15 in lieu of (10, 15) and how manage to add x tick missing like 9:15, 9:30... to get a complet time line??
You can resample your TIME column to 15 mins intervalls and count the number of rows. Then plot a regular bar chart.
df = pd.DataFrame({'TIME': pd.to_datetime('2019-01-01') + pd.to_timedelta(pd.np.random.rand(100) * 3, unit='h')})
df = df[df.TIME.dt.minute > 15] # make gap
ax = df.resample('15T', on='TIME').count().plot.bar(rot=0)
ticklabels = [x.get_text()[-8:-3] for x in ax.get_xticklabels()]
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(ticklabels))
(for details about formatting datetime ticklabels of pandas bar plots see this SO question)

Tick labels overlap in pandas bar chart

TL;DR: In pandas how do I plot a bar chart so that its x axis tick labels look like those of a line chart?
I made a time series with evenly spaced intervals (one item each day) and can plot it like such just fine:
intensity[350:450].plot()
plt.show()
But switching to a bar chart created this mess:
intensity[350:450].plot(kind = 'bar')
plt.show()
I then created a bar chart using matplotlib directly but it lacks the nice date time series tick label formatter of pandas:
def bar_chart(series):
fig, ax = plt.subplots(1)
ax.bar(series.index, series)
fig.autofmt_xdate()
plt.show()
bar_chart(intensity[350:450])
Here's an excerpt from the intensity Series:
intensity[390:400]
2017-03-07 3
2017-03-08 0
2017-03-09 3
2017-03-10 0
2017-03-11 0
2017-03-12 0
2017-03-13 2
2017-03-14 0
2017-03-15 3
2017-03-16 0
Freq: D, dtype: int64
I could go all out on this and just create the tick labels by hand completely but I'd rather not have to baby matplotlib and let do pandas its job and do what it did in the very first figure but with a bar plot. So how do I do that?
Pandas bar plots are categorical plots. They create one tick (+label) for each category. If the categories are dates and those dates are continuous one may aim at leaving certain dates out, e.g. to plot only every fifth category,
ax = series.plot(kind="bar")
ax.set_xticklabels([t if not i%5 else "" for i,t in enumerate(ax.get_xticklabels())])
In contrast, matplotlib bar charts are numberical plots. Here a useful ticker can be applied, which ticks the dates weekly, monthly or whatever is needed.
In addition, matplotlib allows to have full control over the tick positions and their labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
index = pd.date_range("2018-01-26", "2018-05-05")
series = pd.Series(np.random.rayleigh(size=100), index=index)
plt.bar(series.index, series.values)
plt.gca().xaxis.set_major_locator(dates.MonthLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter("%b\n%Y"))
plt.show()