Changing the tick frequency on the x axis for each subplot on a FacetGrid plot in Seaborn - matplotlib

Problem:
I am trying to create a FacetGrid plot in Seaborn, where I have a yearWeek column as the x-axis and a conversionRate column as the y-axis. However, I want to only display every second yearWeek on the x-axis. How can I achieve this?
 
My current attempt:
 
!python --version
print(f'Seaborn version: {sns.__version__}')
 
data = {'yearWeek': ['2022-W1','2022-W2','2022-W3','2022-W4','2022-W5','2022-W6','2022-W7','2022-W8','2022-W9','2022-W10','2022-W11','2022-W12']*3,
        'country': ['US','US','US','US','US','US','US','US','US','US','US','US'] + ['India','India','India','India','India','India','India','India','India','India','India','India'] + ['Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia'],
        'conversionRate': [np.random.rand() for i in range(12*3)]
       }
 
df = pd.DataFrame(data)
 
g = sns.FacetGrid(df, col="country", aspect=1.5)
g.map_dataframe(sns.lineplot, x='yearWeek', y='conversionRate')
for ax in g.axes.flat:
    ax.set_xticks(ax.get_xticks()[::2])
    plt.setp(ax.get_xticklabels(), rotation=45)

Instead of slicing your dataframe you can just define the tick distance with matplotlib.ticker. This is very useful for all kinds of plots where you don't want to have auto-ticks.
See your modified code:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
data = {'yearWeek': ['2022-W1','2022-W2','2022-W3','2022-W4','2022-W5','2022-W6','2022-W7','2022-W8','2022-W9','2022-W10','2022-W11','2022-W12']*3,
'country': ['US','US','US','US','US','US','US','US','US','US','US','US'] + ['India','India','India','India','India','India','India','India','India','India','India','India'] + ['Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia'],
'conversionRate': [np.random.rand() for i in range(12*3)]
}
df = pd.DataFrame(data)
g = sns.FacetGrid(df, col="country", aspect=1.5)
g.map_dataframe(sns.lineplot, x='yearWeek', y='conversionRate')
for ax in g.axes.flat:
xtick_spacing = 2
ax.xaxis.set_major_locator(ticker.MultipleLocator(xtick_spacing))
# ax.set_xticks(ax.get_xticks()[::2])
plt.setp(ax.get_xticklabels(), rotation=45)
Result:

If I understand correctly, the question is most related to pandas.
You just need to slice df appropriately to get every second yearWeek: df = df[1::2].
Additionally, you can use reset_index with drop=True argument to "reset the index to the default integer index" of the DataFrame df post-slicing.
Full code:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# generate reproducible `conversionRate`
rng = np.random.default_rng(12)
data = {'yearWeek': ['2022-W1','2022-W2','2022-W3','2022-W4','2022-W5','2022-W6','2022-W7','2022-W8','2022-W9','2022-W10','2022-W11','2022-W12']*3,
'country': ['US','US','US','US','US','US','US','US','US','US','US','US'] + ['India','India','India','India','India','India','India','India','India','India','India','India'] + ['Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia','Australia'],
'conversionRate': [rng.random() for i in range(12*3)]
}
df = pd.DataFrame(data)
df = df[1::2].reset_index(drop=True)
print(df)
g = sns.FacetGrid(df, col="country", aspect=1.5)
g.map_dataframe(sns.lineplot, x='yearWeek', y='conversionRate')
for ax in g.axes.flat:
plt.setp(ax.get_xticklabels(), rotation=45)
plt.show()
with df as:
yearWeek country conversionRate
0 2022-W2 US 0.946753
1 2022-W4 US 0.179291
2 2022-W6 US 0.230541
3 2022-W8 US 0.115079
4 2022-W10 US 0.858130
5 2022-W12 US 0.541466
6 2022-W2 India 0.257955
7 2022-W4 India 0.453616
8 2022-W6 India 0.927517
9 2022-W8 India 0.187890
10 2022-W10 India 0.946619
11 2022-W12 India 0.880250
12 2022-W2 Australia 0.936696
13 2022-W4 Australia 0.871556
14 2022-W6 Australia 0.219390
15 2022-W8 Australia 0.661634
16 2022-W10 Australia 0.201345
17 2022-W12 Australia 0.763625
Finally, if you don't want to change the original df, please check Returning view vs. copy as a precautionary measure.

Related

how to plot rows of a column in loop

I have a data frame and I want to plot rows of a column in a loop
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
df = pd.DataFrame({
'name': ['joe', 'sue', 'mike'],
'x': ['0,1,5,3,4,5', '0,4,4,2,8', '0,4,6,7,8,9,0'],
'y': ['0,3,8', '1,9,5', '1,6,4,5,6,2,3,4,6']
})
print(df)
name x y
0 joe 0,1,5,3,4,5 0,3,8
1 sue 0,4,4,2,8 1,9,5
2 mike 0,4,6,7,8,9,0 1,6,4,5,6,2,3,4,6
I want to plot every rows of x for example i will plot 0,1,5,3,4,5 then 0,4,4,2,8 then 0,4,6,7,8,9,0 in a loop.
You can use:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
for row in df['x'].str.split(','): # explode string
data = np.array(row).astype(float) # convert str to float
ax.plot(data)
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_title('My figure')
plt.show()
Output:

How to plot plotBox and a line plot with different axes

I have a dataset that can be crafted in this way:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
date_range = pd.date_range(start='2021-11-20', end='2022-01-09').to_list()
df_left = pd.DataFrame(columns=['Date','Values'])
for d in date_range*3:
if (np.random.randint(0,2) == 0):
df_left = df_left.append({'Date': d, 'Values': np.random.randint(1,11)}, ignore_index=True)
df_left["year-week"] = df_left["Date"].dt.strftime("%Y-%U")
df_right = pd.DataFrame(
{
"Date": date_range,
"Values": np.random.randint(0, 50 , len(date_range)),
}
)
df_right_counted = df_right.resample('W', on='Date')['Values'].sum().to_frame().reset_index()
df_right_counted["year-week"] = df_right_counted["Date"].dt.strftime("%Y-%U")
pd_right_counted:
Date Values year-week
0 2021-12-05 135 2021-49
1 2021-12-12 219 2021-50
2 2021-12-19 136 2021-51
3 2021-12-26 158 2021-52
4 2022-01-02 123 2022-01
5 2022-01-09 222 2022-02
And pd_left:
Date Values year-week
0 2021-12-01 10 2021-48
1 2021-12-05 1 2021-49
2 2021-12-07 5 2021-49
...
13 2022-01-07 7 2022-01
14 2022-01-08 9 2022-01
15 2022-01-09 6 2022-02
And I'd like to create this graph in matplotlib.
Where a boxplot is plotted with df_left and it uses the y-axis on the left and a normal line plot is plotted with df_right_counted and uses the y-axis on the right.
This is my attempt (+ the Fix from the comment of Javier) so far but I am completely stuck with:
making both of the graphs starting from the same week ( I'd like to start from 2021-49 )
Plot another x-axis on the right and Let the line plot use it
This is my attempt so far:
fig, ax = plt.subplots(nrows=1, ncols=1, dpi=100)
fig.tight_layout()
fig.set_tight_layout(True)
fig.set_facecolor('white')
ax2=ax.twinx()
df_left.boxplot(figsize=(31, 8), column='Values', by='year-week', ax=ax)
df_right_counted.plot(figsize=(31, 8), x='year-week', y='Values', ax=ax2)
plt.show()
Could you give me some guidance? I am still learning using matplotlib
One of the problems is that resample('W', on='Date') and .dt.strftime("%Y-%U") seem to lead to different numbers in both dataframes. Another problem is that boxplot internally labels the boxes starting with 1.
Some possible workarounds:
oblige boxplot to number starting from one
create the counts via first extracting the year-week and then use group_by; that way the week numbers should be consistent
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
date_range = pd.date_range(start='2021-11-20', end='2022-01-09').to_list()
df_left = pd.DataFrame(columns=['Date', 'Values'])
for d in date_range * 3:
if (np.random.randint(0, 2) == 0):
df_left = df_left.append({'Date': d, 'Values': np.random.randint(1, 11)}, ignore_index=True)
df_left["year-week"] = df_left["Date"].dt.strftime("%Y-%U")
df_right = pd.DataFrame({"Date": date_range,
"Values": np.random.randint(0, 50, len(date_range))})
df_right["year-week"] = df_right["Date"].dt.strftime("%Y-%U")
df_right_counted = df_right.groupby('year-week')['Values'].sum().to_frame().reset_index()
fig, ax = plt.subplots(nrows=1, ncols=1, dpi=100)
fig.tight_layout()
fig.set_tight_layout(True)
fig.set_facecolor('white')
ax2 = ax.twinx()
df_left.boxplot(figsize=(31, 8), column='Values', by='year-week', ax=ax,
positions=np.arange(len(df_left['year-week'].unique())))
df_right_counted.plot(figsize=(31, 8), x='year-week', y='Values', ax=ax2)
plt.show()

PLOTLY: Round values in a pie chart

I have the following code to create a plotly pie chart
fig6 = px.pie(df_final, values='Price', names='Domain',
title='Total Price')
fig6.update_traces(textposition='inside', textinfo='percent+label+value')
fig6.show()
The values ("Price") is in millions and it shows as such in the pie chart. For example, 23,650,555 is one price value on the pie chart. However I would like to round it up to 24 (million). I would like to do that for every pie in the pie chart.
Thanks,
You could perform a roundup action to Price column.
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
"Price": [23650555, 31483589, 10473875, 56346772],
"Domain": ["3", "1", "0", "6"],
})
def roundup(x):
return int(math.ceil(x / 1000000.0)) * 1000000
def absolute_value(val):
a = np.round(val/100.*df['Price'].sum(), 0)
return roundup(a)
print(df)
# Price Domain
# 0 23650555 3
# 1 31483589 1
# 2 10473875 0
# 3 56346772 6
df['Price'] = df['Price'].apply(roundup)
df.set_index(['Domain'], inplace=True)
print(df)
# Price Domain
# 0 24000000 3
# 1 32000000 1
# 2 11000000 0
# 3 57000000 6
df.plot(kind='pie', y='Price', autopct=absolute_value)
plt.show()
Reference: Python round up integer to next hundred

How to plot two columns of a specific index range?

I made a dataframe from a .txt file that has 2 columns. I have a specific indexing range (3751:6252) for which I want to plot column 1 (freq) vs column 2 (phase).
How can I do this?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#open text file and save it as a df (dataframe)
textfile = pd.read_csv("RawPhaseData.txt", sep='\s+', skiprows= 2, header=None, names = ['freq', 'phase'])
#make the index equal to frequency
#textfile.set_index("freq", inplace=True)
p = textfile[3751:6252]
x = p['freq']
y = p['phase']
plt.plot(x,y)
plt.show()
Expecting a graph but nothing outputs.
Part of the results when I say print(p)
freq phase
3751 0.55000000000000 -51.101839657065
3752 0.55004000000000 -51.251119837268
3753 0.55008000000000 -51.400516517531
3754 0.55012000000000 -51.550029980720
3755 0.55016000000000 -51.699660509792
3756 0.55020000000000 -51.849408387787
....
Part of the results when I say print(x)
3751 0.55000000000000
3752 0.55004000000000
3753 0.55008000000000
3754 0.55012000000000
3755 0.55016000000000
3756 0.55020000000000
3757 0.55024000000000
3758 0.55028000000000
3759 0.55032000000000
....
Part of the results when I say print(y)
3751 -51.101839657065
3752 -51.251119837268
3753 -51.400516517531
3754 -51.550029980720
3755 -51.699660509792
3756 -51.849408387787
3757 -51.999273897829
....
Maybe you need to add %matplotlib inline after the import.
import matplotlib.pyplot as plt
%matplotlib inline

Only plotting observed dates in matplotlib, skipping range of dates

I have a simple dataframe I am plotting in matplotlib. However, the plot is showing the range of the dates, rather than just the two observed data points.
How can I only plot the two data points and not the range of the dates?
df structure:
Date Number
2018-01-01 12:00:00 1
2018-02-01 12:00:00 2
Output of the matplotlib code:
Here is what I expected (this was done using a string and not a date on the x-axis data):
df code:
import pandas as pd
df = pd.DataFrame([['2018-01-01 12:00:00', 1], ['2018-02-01 12:00:00',2]], columns=['Date', 'Number'])
df['Date'] = pd.to_datetime(df['Date'])
df.set_index(['Date'],inplace=True)
Plot code:
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots(
figsize=(4,5),
dpi=72
)
width = 0.75
#starts the bar chart creation
ax1.bar(df.index, df['Number'],
width,
align='center',
color=('#666666', '#333333'),
edgecolor='#FF0000',
linewidth=2
)
ax1.set_ylim(0,3)
ax1.set_ylabel('Score')
fig.autofmt_xdate()
#Title
plt.title('Scores by group and gender')
plt.tight_layout()
plt.show()
Try adding something like:
import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%y-%m-%d')
ax1.xaxis.set_major_formatter(myFmt)
plt.xticks(df.index)
I think the dates are transformed to large integers at the time of the plot. So width = 0.75 is very small, try something bigger (like width = 20:
Matplotlib bar plots are numeric in nature. If you want a categorical bar plot instead, you may use pandas bar plots.
df.plot.bar()
You may then want to beautify the labels a bit
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame([['2018-01-01 12:00:00', 1], ['2018-02-01 12:00:00',2]], columns=['Date', 'Number'])
df['Date'] = pd.to_datetime(df['Date'])
df.set_index(['Date'],inplace=True)
ax = df.plot.bar()
ax.tick_params(axis="x", rotation=0)
ax.set_xticklabels([t.get_text().split()[0] for t in ax.get_xticklabels()])
plt.show()