How to plot plotBox and a line plot with different axes - pandas

I have a dataset that can be crafted in this way:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
date_range = pd.date_range(start='2021-11-20', end='2022-01-09').to_list()
df_left = pd.DataFrame(columns=['Date','Values'])
for d in date_range*3:
if (np.random.randint(0,2) == 0):
df_left = df_left.append({'Date': d, 'Values': np.random.randint(1,11)}, ignore_index=True)
df_left["year-week"] = df_left["Date"].dt.strftime("%Y-%U")
df_right = pd.DataFrame(
"Date": date_range,
"Values": np.random.randint(0, 50 , len(date_range)),
df_right_counted = df_right.resample('W', on='Date')['Values'].sum().to_frame().reset_index()
df_right_counted["year-week"] = df_right_counted["Date"].dt.strftime("%Y-%U")
Date Values year-week
0 2021-12-05 135 2021-49
1 2021-12-12 219 2021-50
2 2021-12-19 136 2021-51
3 2021-12-26 158 2021-52
4 2022-01-02 123 2022-01
5 2022-01-09 222 2022-02
And pd_left:
Date Values year-week
0 2021-12-01 10 2021-48
1 2021-12-05 1 2021-49
2 2021-12-07 5 2021-49
13 2022-01-07 7 2022-01
14 2022-01-08 9 2022-01
15 2022-01-09 6 2022-02
And I'd like to create this graph in matplotlib.
Where a boxplot is plotted with df_left and it uses the y-axis on the left and a normal line plot is plotted with df_right_counted and uses the y-axis on the right.
This is my attempt (+ the Fix from the comment of Javier) so far but I am completely stuck with:
making both of the graphs starting from the same week ( I'd like to start from 2021-49 )
Plot another x-axis on the right and Let the line plot use it
This is my attempt so far:
fig, ax = plt.subplots(nrows=1, ncols=1, dpi=100)
df_left.boxplot(figsize=(31, 8), column='Values', by='year-week', ax=ax)
df_right_counted.plot(figsize=(31, 8), x='year-week', y='Values', ax=ax2)
Could you give me some guidance? I am still learning using matplotlib

One of the problems is that resample('W', on='Date') and .dt.strftime("%Y-%U") seem to lead to different numbers in both dataframes. Another problem is that boxplot internally labels the boxes starting with 1.
Some possible workarounds:
oblige boxplot to number starting from one
create the counts via first extracting the year-week and then use group_by; that way the week numbers should be consistent
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
date_range = pd.date_range(start='2021-11-20', end='2022-01-09').to_list()
df_left = pd.DataFrame(columns=['Date', 'Values'])
for d in date_range * 3:
if (np.random.randint(0, 2) == 0):
df_left = df_left.append({'Date': d, 'Values': np.random.randint(1, 11)}, ignore_index=True)
df_left["year-week"] = df_left["Date"].dt.strftime("%Y-%U")
df_right = pd.DataFrame({"Date": date_range,
"Values": np.random.randint(0, 50, len(date_range))})
df_right["year-week"] = df_right["Date"].dt.strftime("%Y-%U")
df_right_counted = df_right.groupby('year-week')['Values'].sum().to_frame().reset_index()
fig, ax = plt.subplots(nrows=1, ncols=1, dpi=100)
ax2 = ax.twinx()
df_left.boxplot(figsize=(31, 8), column='Values', by='year-week', ax=ax,
df_right_counted.plot(figsize=(31, 8), x='year-week', y='Values', ax=ax2)


Changing the tick frequency on the x axis for each subplot on a FacetGrid plot in Seaborn

I am trying to create a FacetGrid plot in Seaborn, where I have a yearWeek column as the x-axis and a conversionRate column as the y-axis. However, I want to only display every second yearWeek on the x-axis. How can I achieve this?
My current attempt:
!python --version
print(f'Seaborn version: {sns.__version__}')
data = {'yearWeek': ['2022-W1','2022-W2','2022-W3','2022-W4','2022-W5','2022-W6','2022-W7','2022-W8','2022-W9','2022-W10','2022-W11','2022-W12']*3,
        'country': ['US','US','US','US','US','US','US','US','US','US','US','US'] + ['India','India','India','India','India','India','India','India','India','India','India','India'] + ['Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia'],
        'conversionRate': [np.random.rand() for i in range(12*3)]
df = pd.DataFrame(data)
g = sns.FacetGrid(df, col="country", aspect=1.5)
g.map_dataframe(sns.lineplot, x='yearWeek', y='conversionRate')
for ax in g.axes.flat:
    plt.setp(ax.get_xticklabels(), rotation=45)
Instead of slicing your dataframe you can just define the tick distance with matplotlib.ticker. This is very useful for all kinds of plots where you don't want to have auto-ticks.
See your modified code:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
data = {'yearWeek': ['2022-W1','2022-W2','2022-W3','2022-W4','2022-W5','2022-W6','2022-W7','2022-W8','2022-W9','2022-W10','2022-W11','2022-W12']*3,
'country': ['US','US','US','US','US','US','US','US','US','US','US','US'] + ['India','India','India','India','India','India','India','India','India','India','India','India'] + ['Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia'],
'conversionRate': [np.random.rand() for i in range(12*3)]
df = pd.DataFrame(data)
g = sns.FacetGrid(df, col="country", aspect=1.5)
g.map_dataframe(sns.lineplot, x='yearWeek', y='conversionRate')
for ax in g.axes.flat:
xtick_spacing = 2
# ax.set_xticks(ax.get_xticks()[::2])
plt.setp(ax.get_xticklabels(), rotation=45)
If I understand correctly, the question is most related to pandas.
You just need to slice df appropriately to get every second yearWeek: df = df[1::2].
Additionally, you can use reset_index with drop=True argument to "reset the index to the default integer index" of the DataFrame df post-slicing.
Full code:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# generate reproducible `conversionRate`
rng = np.random.default_rng(12)
data = {'yearWeek': ['2022-W1','2022-W2','2022-W3','2022-W4','2022-W5','2022-W6','2022-W7','2022-W8','2022-W9','2022-W10','2022-W11','2022-W12']*3,
'country': ['US','US','US','US','US','US','US','US','US','US','US','US'] + ['India','India','India','India','India','India','India','India','India','India','India','India'] + ['Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia'],
'conversionRate': [rng.random() for i in range(12*3)]
df = pd.DataFrame(data)
df = df[1::2].reset_index(drop=True)
g = sns.FacetGrid(df, col="country", aspect=1.5)
g.map_dataframe(sns.lineplot, x='yearWeek', y='conversionRate')
for ax in g.axes.flat:
plt.setp(ax.get_xticklabels(), rotation=45)
with df as:
yearWeek country conversionRate
0 2022-W2 US 0.946753
1 2022-W4 US 0.179291
2 2022-W6 US 0.230541
3 2022-W8 US 0.115079
4 2022-W10 US 0.858130
5 2022-W12 US 0.541466
6 2022-W2 India 0.257955
7 2022-W4 India 0.453616
8 2022-W6 India 0.927517
9 2022-W8 India 0.187890
10 2022-W10 India 0.946619
11 2022-W12 India 0.880250
12 2022-W2 Australia 0.936696
13 2022-W4 Australia 0.871556
14 2022-W6 Australia 0.219390
15 2022-W8 Australia 0.661634
16 2022-W10 Australia 0.201345
17 2022-W12 Australia 0.763625
Finally, if you don't want to change the original df, please check Returning view vs. copy as a precautionary measure.

How to draw pandas dataframe using Matplotlib hist with multiple y axes

I have a dataframe as below:
frame_id time_stamp pixels step
0 50 06:34:10 0.000000 0
1 100 06:38:20 0.000000 0
2 150 06:42:30 3.770903 1
3 200 06:46:40 3.312285 1
4 250 06:50:50 3.077356 0
5 300 06:55:00 2.862603 0
I want to draw two y-axes in one plot. One is for pixels. The other is for step. x-axis is time_stamp. I want the plot for step like the green line like this:
Here's an example that could help. Change d1 and d2 as per your variables and the respective labels as well.
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(seed=0)
d1 = rng.normal(loc=20, scale=5, size=200)
d2 = rng.normal(loc=30, scale=5, size=500)
fig, ax1 = plt.subplots(figsize=(9,5))
#create twin axes
ax2 = ax1.twinx()
ax1.hist([d1], bins=15, histtype='barstacked', linewidth=2,
ax2.hist([d2], bins=15, histtype='step', linewidth=2,
ax1.set_ylabel('d1 freq')
ax2.set_ylabel('d2 freq')
Getting the bar labels is not easy with the two types of histograms in the same plot using matplotlib.
Instead of histograms you could use bar plots to get the desired output. I have also added in a function to help get the bar labels.
import matplotlib.pyplot as plt
import numpy as np
time = ['06:34:10','06:38:20','06:42:30','06:46:40','06:50:50','06:55:00']
step = [0,0,1,1,0,0]
pixl = [0.00,0.00,3.77,3.31,3.077,2.862]
#function to add labels
def addlabels(x,y):
for i in range(len(x)):
plt.text(i, y[i], y[i], ha = 'center')
fig, ax1 = plt.subplots(figsize=(9,5))
#generate twin axes
ax2 = ax1.twinx()
ax1.step(time,step, 'k',where="mid",linewidth=1),pixl,linewidth=1)

How to format in Matplotlib the x axis ticks to the only hours and minutes and define regular intervals?

I would like to have in the x axis the following ticks numbers and intervals:
[6:00; 8:00; 10:00: 12:00; 14:00, 16:00, 18:00]. The point at '12:00' should also be in the center of the figure, now it is shifted to the right
I tried to convert the column 'time' to a datetime format, but I get an error:
TypeError: <class 'datetime.time'> is not convertible to datetime
My code looks like this:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df3 = pd.read_excel(r'results.xlsx')
# failed attempt
df3["time"] = pd.to_datetime(df3["time"])
# plotting
color = 'black'
ax1 = sns.lineplot(x = 'time', y = 'parabolic', data = df3, color = color,
label='Light intensity')
ax2 = ax1.twinx()
ax2 = sns.scatterplot(x = 'time', y = 'Gas-exchange_p', hue= 'Sampling',
marker='v', s=200, data = df3, label= 'Measurement time points')
My dataframe in excel looks like this:
time y
12:00:00 AM 0
12:01:00 AM 0
12:02:00 AM 0
2:00:40 PM 416

Plotting annual mean and standard deviation in different colors for each year

I have data for several years. I have calculated mean and standard deviation for each year. Now I want to plot each row with mean as a scatter plot and fill plot between the standard deviations that is mean plus minus standard deviation in different colors for different years.
After using df_wc.set_index('Date').resample('Y')["Ratio(a/w)"].mean() it returns only the last date of the year (as shown below in the data set) but I want the fill plot for standard deviation to spread for the entire year.
Sample Data set:
Date | Mean | Std_dv
1858-12-31 1.284273 0.403052
1859-12-31 1.235267 0.373283
1860-12-31 1.093308 0.183646
1861-12-31 1.403693 0.400722
That's a very good question that you have asked, and it did not have an easy answer. But if I had understood the problem correctly, you need a fill plot with different colours for each year. The upper bound and lower bound of the plot will be between mean + std and mean - std?
So, I formed a custom time series and this is how I have plotted the values with the upper bound and lower bounds:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection,PatchCollection
from matplotlib.colors import ListedColormap, BoundaryNorm
import pandas as pd
ts = range(10)
num_classes = len(ts)
df = pd.DataFrame(data={'TOTAL': np.random.rand(len(ts)), 'Label': list(range(0, num_classes))}, index=ts)
df['UB'] = df['TOTAL'] + 2
df['LB'] = df['TOTAL'] - 2
colors = ['r', 'g', 'b', 'y', 'purple', 'orange', 'k', 'pink', 'grey', 'violet']
cmap = ListedColormap(colors)
norm = BoundaryNorm(range(num_classes+1), cmap.N)
points = np.array([df.index, df['TOTAL']]).T.reshape(-1, 1, 2)
pointsUB = np.array([df.index, df['UB']]).T.reshape(-1, 1, 2)
pointsLB = np.array([df.index, df['LB']]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
segmentsUB = np.concatenate([pointsUB[:-1], pointsUB[1:]], axis=1)
segmentsLB = np.concatenate([pointsLB[:-1], pointsLB[1:]], axis=1)
lc = LineCollection(segments, cmap=cmap, norm=norm, linestyles='dashed')
lcUB = LineCollection(segmentsUB, cmap=cmap, norm=norm, linestyles='solid')
lcLB = LineCollection(segmentsLB, cmap=cmap, norm=norm, linestyles='solid')
fig1 = plt.figure()
for i in range(len(colors)):
plt.fill_between( df.index,df['UB'],df['LB'], where= ((df.index >= i) & (df.index <= i+1)), alpha = 0.1,color=colors[i])
plt.xlim(df.index.min(), df.index.max())
plt.ylim(-3.1, 3.1)
And the result dataframe obtained looks like this:
0 0.681455 0 2.681455 -1.318545
1 0.987058 1 2.987058 -1.012942
2 0.212432 2 2.212432 -1.787568
3 0.252284 3 2.252284 -1.747716
4 0.886021 4 2.886021 -1.113979
5 0.369499 5 2.369499 -1.630501
6 0.765192 6 2.765192 -1.234808
7 0.747923 7 2.747923 -1.252077
8 0.543212 8 2.543212 -1.456788
9 0.793860 9 2.793860 -1.206140
And the plot looks like this:
Let me know if this helps! :)

ax.twinx label appears twice

I have been trying to make a chart based on an excel, using Matplotlib and Seaborn. Code is from the internet, adapted to what I want.
The issue is that the legend appears 2 times.
Do you have any recommendations?
Report screenshot: enter image description here
Excel table is:
Month Value (tsd eur) Total MAE
0 Mar 2020 14.0 1714.0
1 Apr 2020 22.5 1736.5
2 Jun 2020 198.0 1934.5
3 Jan 2021 45.0 1979.5
4 Feb 2021 60.0 2039.5
5 Jan 2022 67.0 2106.5
6 Feb 2022 230.0 2336.5
7 Mar 2022 500.0 2836.5
Code is:
import pandas as pd
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
mae['Month'] = mae['Month'].apply(lambda x: pd.Timestamp(x).strftime('%b %Y'))
a=mae['Value (tsd eur)']
b=mae['Total MAE']
#Create combo chart
fig, ax1 = plt.subplots(figsize=(20,12))
color = 'tab:green'
#bar plot creation
ax1.set_title('MAE Investments', fontsize=25)
ax1.set_xlabel('Month', fontsize=23)
ax1.set_ylabel('Investments (tsd. eur)', fontsize=23)
ax1 = sns.barplot(x='Month', y='Value (tsd eur)', data = mae, palette='Blues',label="Value (tsd eur)")
ax1.tick_params(axis='x', which='major', labelsize=20, labelrotation=40)
#specify we want to share the same x-axis
ax2 = ax1.twinx()
color = 'tab:red'
#line plot creation
ax2.set_ylabel('Total MAE Value', fontsize=16)
ax2 = sns.lineplot(x='Month', y='Total MAE', data = mae, sort=False, color='blue',label="Total MAE")
ax2.tick_params(axis='y', color=color,labelsize=20)
h1, l1 = ax1.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
ax1.legend(h1+h2, l1+l2, loc=2, prop={'size': 24})
for i,j in b.items():
ax2.annotate(str(j), xy=(i, j+30))
for i,j in a.items():
ax1.annotate(str(j), xy=(i, j+2))
#show plot
Update: found the answer here:
Secondary axis with twinx(): how to add to legend?
code used:
lines, labels =ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax2.legend(lines + lines2, labels + labels2, title="Legend", loc=2, prop={'size': 24})
insteaf of:
for i,j in b.items():
ax2.annotate(str(j), xy=(i, j+30))
for i,j in a.items():
ax1.annotate(str(j), xy=(i, j+2))