Creating a grouped bar plot with Seaborn - pandas

I am trying to create a grouped bar graph using Seaborn but I am getting a bit lost in the weeds. I actually have it working but it does not feel like an elegant solution. Seaborn only seems to support clustered bar graphs when there is a binary option such as Male/Female. (https://seaborn.pydata.org/examples/grouped_barplot.html)
It does not feel right having to fall back onto matplotlib so much - using the subplots feels a bit dirty :). Is there a way of handling this completely in Seaborn?
Thanks,
Andrew
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
sns.set_theme(style="whitegrid")
rcParams.update({'figure.autolayout': True})
dataframe = pd.read_csv("https://raw.githubusercontent.com/mooperd/uk-towns/master/uk-towns-sample.csv")
dataframe = dataframe.groupby(['nuts_region']).agg({'elevation': ['mean', 'max', 'min'],
'nuts_region': 'size'}).reset_index()
dataframe.columns = list(map('_'.join, dataframe.columns.values))
# We need to melt our dataframe down into a long format.
tidy = dataframe.melt(id_vars='nuts_region_').rename(columns=str.title)
# Create a subplot. A Subplot makes it convenient to create common layouts of subplots.
# https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.subplots.html
fig, ax1 = plt.subplots(figsize=(6, 6))
# https://stackoverflow.com/questions/40877135/plotting-two-columns-of-dataframe-in-seaborn
g = sns.barplot(x='Nuts_Region_', y='Value', hue='Variable', data=tidy, ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()

I'm not sure why you need seaborn. Your data is wide format, so pandas does it pretty well without the need for melting:
from matplotlib import rcParams
sns.set(style="whitegrid")
rcParams.update({'figure.autolayout': True})
fig, ax1 = plt.subplots(figsize=(12,6))
dataframe.plot.bar(x='nuts_region_', ax=ax1)
plt.tight_layout()
plt.xticks(rotation=45, ha="right")
plt.show()
Output:

Related

How do I plot two graphs of two different dataframes side by side?

I have two DataFrames that have time-series data of BTC. I want to display the graphs side by side to analyze them.
display(data_df.plot(figsize=(15,20)))
display(model_df.plot(figsize=(15,20)))
When I plot them like this they stack on top of each-other vertically. I want them side-by-side so they look like this.
enter image description here
Heres one way that might work using subplots (Im guessing you want a total figsize=30x20):
import pylab as plt
fig,(ax0,ax1) = plt.subplots(nrows=1,ncols=2, figsize=(30,20))
data_df.plot(ax=ax0)
model_df.plot(ax=ax1)
You can use matplotlib.pyplot.subplots :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data_df = pd.DataFrame(np.random.randint(0,100,size=(15, 2)), columns=list('AB'))
model_df = pd.DataFrame(np.random.randint(0,100,size=(15, 2)), columns=list('AB'))
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12,4))
for col, ax in zip(data_df, axes):
data_df[col].plot(ax=ax, label=f"data_df ({col})")
model_df[col].plot(ax=ax, label=f"model_df ({col})")
ax.legend()
# Output :

Is there a way to draw shapes on a python pandas plot

I am creating shot plots for NHL games and I have succeeded in making the plot, but I would like to draw the lines that you see on a hockey rink on it. I basically just want to draw two circles and two lines on the plot like this.
Let me know if this is possible/how I could do it
Pandas plot is in fact matplotlib plot, you can assign it to variable and modify it according to your needs ( add horizontal and vertical lines or shapes, text, etc)
# plot your data, but instead diplaying it assing Figure and Axis to variables
fig, ax = df.plot()
ax.vlines(x, ymin, ymax, colors='k', linestyles='solid') # adjust to your needs
plt.show()
working code sample
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
from matplotlib.patches import Circle
from matplotlib.collections import PatchCollection
df = seaborn.load_dataset('tips')
ax = df.plot.scatter(x='total_bill', y='tip')
ax.vlines(x=40, ymin=0, ymax=20, colors='red')
patches = [Circle((50,10), radius=3)]
collection = PatchCollection(patches, alpha=0.4)
ax.add_collection(collection)
plt.show()

how to prevent seaborn to skip year in xtick label in Timeseries Plot

I have included the screenshot of the plot. Is there a way to prevent seaborn from skipping the xtick labels in timeseries data.
Most seaborn functions return a matplotlib object, so you can control the number of major ticks displayed via matplotlib. By default, matplotlib will auto-scale, which is why it hides some year labels, you can try to set the MaxNLocator.
Consider the following example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# load data
df = sns.load_dataset('flights')
df.drop_duplicates('year', inplace=True)
df.year = df.year.astype('str')
# plot
fig, ax = plt.subplots(figsize=(5, 2))
sns.lineplot(x='year', y='passengers', data=df, ax=ax)
ax.xaxis.set_major_locator(plt.MaxNLocator(5))
This gives you:
ax.xaxis.set_major_locator(plt.MaxNLocator(10))
will give you
Agree with answer of #steven, just want to say that methods for xticks like plt.xticks or ax.xaxis.set_ticks seem more natural to me. Full details can be found here.

How to plot only y-axis using seaborn

i am new to machine learning, and i want to compare the predicted and the actual value, now i want to compare this both of the data in plot to see if both the values are same or not.
data:
[-0.26112159, 1.84683522, 2.23912728, 1.58848056, 1.28589823,
2.01355579, -0.144594 , 0.8845673 , -0.19764173, 0.00837658,
1.3515489 , 0.18876488, 1.07088203, 1.11333346, 0.99854107,
1.67141781, 1.74938417, 1.17907989, 1.57017018, 2.04269495,
-0.10662102, 0.96283466, -0.01117658, 0.01610438, 1.31111783,
-0.08608504, -0.09535655, -0.0227967 , 1.82867539, 1.4492189 ]
this is my data sample for both A and B datasets
i want to plot like this,
I prefer using seaborn
Seaborn provides some elaborated functions to display data. Internally it depends heavily on matplotlib. As the requested plot doesn't fall into the categories Seaborn excells at, it seems more adequate to employ matplotlib's scatter directly:
import numpy as np
from matplotlib import pyplot as plt
A = [-0.26112159, 1.84683522, 2.23912728, 1.58848056, 1.28589823,
2.01355579, -0.144594, 0.8845673, -0.19764173, 0.00837658,
1.3515489, 0.18876488, 1.07088203, 1.11333346, 0.99854107,
1.67141781, 1.74938417, 1.17907989, 1.57017018, 2.04269495,
-0.10662102, 0.96283466, -0.01117658, 0.01610438, 1.31111783,
-0.08608504, -0.09535655, -0.0227967, 1.82867539, 1.4492189]
B = A
x = np.arange(len(A))
plt.scatter(x - 0.2, A, marker='o', color='tomato', label='Dataset A')
plt.scatter(x + 0.2, B, marker='o', color='deepskyblue', label='Dataset B')
plt.legend()
plt.show()
To have crosses as markers, use marker='x' or marker='+'.
To draw a swarmplot (very similar to a stripplot) via Seaborn:
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# A, B = ...
sns.swarmplot(x=np.repeat(['Dataset A', 'Dataset B'], len(A)), y=np.concatenate([A, B]))
plt.show()
A kde plot can be used to compare the statistical distribution. Here is an example with some noise added to make both sets a little different:
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
# A = ...
A = np.array(A)
B = A + np.random.normal(0,.1, len(A))
sns.kdeplot(A, label='Dataset A')
sns.kdeplot(B, label='Dataset B')
plt.show()
If you insist on using seaborn, one way is the following. However, stripplot fits in very well for such case as you don't have to pass the x-values explicitly.
import seaborn as sns
sns.set()
sns.scatterplot(range(len(A)), A, marker='x', color='orange', label='Dataset A')
sns.scatterplot(range(len(A)), A+0.1, marker='x', color='blue', label='Dataset B')
just figured out a way to do so, please let me know if there is any better method to plot for comparing two arrays.
fig,_ax=plt.subplots(1,2,figsize=(15,10))
sns.stripplot(y=predictions,ax=_ax[0])
sns.stripplot(y=y_test,ax=_ax[1],color='red')

How to use Gridspec with Pandas plot

I would like to configure the subplots size using Gridspec as explained in this question.
Python/Matplotlib - Change the relative size of a subplot
How do this, If i want to use Pandas Dataframe's plot funtion? Is it possible at all?
You can do it this way.
import pandas as pd
import pandas.io.data as web
import matplotlib.pyplot as plt
aapl = web.DataReader('AAPL', 'yahoo', '2014-01-01', '2015-05-31')
# 3 x 1 grid, position at (1st row, 1st col), take two rows
ax1 = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
# plot something on this ax1
aapl['Adj Close'].plot(style='r-', ax=ax1)
# plot moving average again on this ax1
pd.ewma(aapl['Adj Close'], span=20).plot(style='k--', ax=ax1)
# get the other ax
ax2 = plt.subplot2grid((3, 1), (2, 0), rowspan=1)
ax2.bar(aapl.index, aapl.Volume)
I don't think you can use gridspec with pandas plot. pandas plot module is a wrapper around the matplotlib pyplot and does not necessarily implements all the functionality.
If you inspect the pandas source on github and do a search on gridspec you will notice that plot offers no options to configure gridspec
https://github.com/pydata/pandas