Adding splitplot (dotplot) to grouped boxplot - Panda and Seaborn - pandas

I am using seaborn for first time, and trying to make a nested (grouped) boxplot with data-points added as dots. Here is my code:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
sns.set(style="ticks")
## Draw a nested boxplot to show bills by day and sex
sns.boxplot(x="day", y="total_bill", hue="smoker",data=tips,width=0.5,palette="PRGn",linewidth=1)
## Draw a split strip plot
sns.stripplot(x="day", y="total_bill", hue="smoker",palette="PRGn",data=tips,size=4,edgecolor="gray",
split=True)
sns.despine(offset=10, trim=True)
plt.show()
And the figure:
You see that dots are not centered to boxes, because of the 'width' parameter used in boxplots. Is there any way I can align dots to boxes? The width parameter in boxplot command is the reason for unaligned dots.
p.s. - I have added the MCVE as mentioned by tom.
Bade

The distance between groups is computed automatically and there's no simple way to change it that I am aware of, but you are using an indirect way to modify it in the boxplot: the keyword width. Use the default value and everything will align.
sns.set(style="ticks")
sns.boxplot(x="day", y="total_bill", hue="smoker", data=tips,
palette="PRGn", linewidth=1)
sns.stripplot(x="day", y="total_bill", hue="smoker", data=tips,
palette="PRGn", size=4, edgecolor="gray", split=True)
sns.despine(offset=10, trim=True)

Related

Is there a way to draw shapes on a python pandas plot

I am creating shot plots for NHL games and I have succeeded in making the plot, but I would like to draw the lines that you see on a hockey rink on it. I basically just want to draw two circles and two lines on the plot like this.
Let me know if this is possible/how I could do it
Pandas plot is in fact matplotlib plot, you can assign it to variable and modify it according to your needs ( add horizontal and vertical lines or shapes, text, etc)
# plot your data, but instead diplaying it assing Figure and Axis to variables
fig, ax = df.plot()
ax.vlines(x, ymin, ymax, colors='k', linestyles='solid') # adjust to your needs
plt.show()
working code sample
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from matplotlib.patches import Circle
from matplotlib.collections import PatchCollection
df = seaborn.load_dataset('tips')
ax = df.plot.scatter(x='total_bill', y='tip')
ax.vlines(x=40, ymin=0, ymax=20, colors='red')
patches = [Circle((50,10), radius=3)]
collection = PatchCollection(patches, alpha=0.4)
ax.add_collection(collection)
plt.show()

Can´t make violin chart appear in subplot

#Hi guys, I don´t know why the third plot below (violin) is not appearing in the third subplot space (it´s empty), could you pls assist?
#Code
import matplotlib.pyplot as plt
import seaborn as sns
fig, ax = plt.subplots(1,3)
fig.suptitle("Age Distribution", fontsize=15)
sns.distplot(insurance_ds["age"], ax=ax[0])
insurance_ds.boxplot(column=["age"], ax=ax[1])
sns.catplot(data=insurance_ds, kind="violin", y="age", ax=ax[2]) #->>>Not showing in third plot space
subplots
catplot() creates a new figure, and cannot be used to plot on a subplot.
You want to use sns.violinplot() directly.

Creating a grouped bar plot with Seaborn

I am trying to create a grouped bar graph using Seaborn but I am getting a bit lost in the weeds. I actually have it working but it does not feel like an elegant solution. Seaborn only seems to support clustered bar graphs when there is a binary option such as Male/Female. (https://seaborn.pydata.org/examples/grouped_barplot.html)
It does not feel right having to fall back onto matplotlib so much - using the subplots feels a bit dirty :). Is there a way of handling this completely in Seaborn?
Thanks,
Andrew
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
sns.set_theme(style="whitegrid")
rcParams.update({'figure.autolayout': True})
dataframe = pd.read_csv("https://raw.githubusercontent.com/mooperd/uk-towns/master/uk-towns-sample.csv")
dataframe = dataframe.groupby(['nuts_region']).agg({'elevation': ['mean', 'max', 'min'],
'nuts_region': 'size'}).reset_index()
dataframe.columns = list(map('_'.join, dataframe.columns.values))
# We need to melt our dataframe down into a long format.
tidy = dataframe.melt(id_vars='nuts_region_').rename(columns=str.title)
# Create a subplot. A Subplot makes it convenient to create common layouts of subplots.
# https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.subplots.html
fig, ax1 = plt.subplots(figsize=(6, 6))
# https://stackoverflow.com/questions/40877135/plotting-two-columns-of-dataframe-in-seaborn
g = sns.barplot(x='Nuts_Region_', y='Value', hue='Variable', data=tidy, ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
I'm not sure why you need seaborn. Your data is wide format, so pandas does it pretty well without the need for melting:
from matplotlib import rcParams
sns.set(style="whitegrid")
rcParams.update({'figure.autolayout': True})
fig, ax1 = plt.subplots(figsize=(12,6))
dataframe.plot.bar(x='nuts_region_', ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
Output:

how to prevent seaborn to skip year in xtick label in Timeseries Plot

I have included the screenshot of the plot. Is there a way to prevent seaborn from skipping the xtick labels in timeseries data.
Most seaborn functions return a matplotlib object, so you can control the number of major ticks displayed via matplotlib. By default, matplotlib will auto-scale, which is why it hides some year labels, you can try to set the MaxNLocator.
Consider the following example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# load data
df = sns.load_dataset('flights')
df.drop_duplicates('year', inplace=True)
df.year = df.year.astype('str')
# plot
fig, ax = plt.subplots(figsize=(5, 2))
sns.lineplot(x='year', y='passengers', data=df, ax=ax)
ax.xaxis.set_major_locator(plt.MaxNLocator(5))
This gives you:
ax.xaxis.set_major_locator(plt.MaxNLocator(10))
will give you
Agree with answer of #steven, just want to say that methods for xticks like plt.xticks or ax.xaxis.set_ticks seem more natural to me. Full details can be found here.

customize the color of bar chart while reading from two different data frame in seaborn

I have plotted a bar chart using the code below:
dffinal['CI-noCI']='Cognitive Impairement'
nocidffinal['CI-noCI']='Non Cognitive Impairement'
res=pd.concat([dffinal,nocidffinal])
sns.barplot(x='6month',y='final-formula',data=res,hue='CI-noCI')
plt.xticks(fontsize=8, rotation=45)
plt.show()
the result is as below:
I want to change the color of them to red and green.
How can I do?
just as information, this plot is reading two different data frame.
the links I have gone through were with the case the dataframe was only one data frame so did not apply to my case.
Thanks :)
You can use matplotlib to overwrite Seaborn's default color cycling to ensure the hues it uses are red and green.
import matplotlib.pyplot as plt
plt.rcParams['axes.prop_cycle'] = ("cycler('color', 'rg')")
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'date': [1,2,3,4,4,5],
'value': [10,15,35,14,18,4],
'hue_v': [1,1,2,1,2,2]})
# The normal seaborn coloring is blue and orange
sns.barplot(x='date', y='value', data=df, hue='hue_v')
# Now change the color cycling and re-make the same plot:
plt.rcParams['axes.prop_cycle'] = ("cycler('color', 'rg')")
sns.barplot(x='date', y='value', data=df, hue='hue_v')
This will now impact all of the other figures you make, so if you want to restore the seaborn defaults for all other plots you need to then do:
sns.reset_orig()