Plotting Combo plots with multiple lines and multiple bars - matplotlib

I have a small dataset I'd like to put into a visual - essentially, I would like to create a combo plot with two lines and two bars for the same category but I can't find any information on this type of plot wihtin matplotlib or seaborn, but it may be that I just don't know how to ask properly.
I have a data set called "df_shr_by_grpx" that contains the following data: Share by Form
I used the following code to get nice combo plots of pct lives (bars) and shares (line) for each of the two different metric values by category:
prod_sel='PRODUCT1'
chan_sel='CHANNEL1'
met_sel='NC'
pctsize_color='orange'
shr_color='blue'
sem=pd.DataFrame(df_shr_by_grpx[(df_shr_by_grpx['prod']==prod_sel) & (df_shr_by_grpx['channel']==chan_sel) & (df_shr_by_grpx['metric']==met_sel+'_Pct_Grp')])
sem['pctlives1']=(round(sem['pctlives']*100,0)).astype(int)
fig, ax = plt.subplots(figsize=(10,7))
# Save the chart so we can loop through the bars below.
bars = ax.bar(
x=sem['category'],
height=sem['pctlives1'],
color=[pctsize_color]
)
# Axis formatting.
ax.spines['top'].set_visible(False)
ax.yaxis.label.set_color(pctsize_color)
ax.tick_params(axis='y',colors=pctsize_color, which='both')
ax.spines['bottom'].set_color('#DDDDDD')
ax.tick_params(bottom=False, left=False)
ax.set_axisbelow(True)
ax.yaxis.grid(True, color='#EEEEEE')
ax.xaxis.grid(False)
ax.set_ylim(0, 105)
ax2 = ax.twinx()
ax2.plot(sem['category'], sem['shr'], '--bo', color=shr_color)
ax2.set_ylabel('Share', labelpad=15, color='blue', fontsize=16)
ax2.tick_params(labelsize=14)
ax2.yaxis.label.set_color(shr_color)
ax2.tick_params(colors=shr_color, which='both')
ax2.set_ylim(0, 1.2)
# zip joins x and y coordinates in pairs
for x,y in zip(sem['category'],sem['shr']):
label = "{:.1%}".format(y)
plt.annotate(label,
(x,y),
textcoords="offset points",
xytext=(0,10),
ha='center', color=shr_color, weight='bold', fontsize=16)
# Add text annotations to the top of the bars.
bar_color = bars[0].get_facecolor()
for bar in bars:
ax.text(
bar.get_x() + bar.get_width() / 2,
bar.get_height() + 1,
round(bar.get_height(), 0).astype(str)+'%',
horizontalalignment='center',
color=bar_color,
weight='bold',
fontsize=16
)
# Add labels and a title.
ax.set_xlabel('Category', labelpad=15, color='#333333', fontsize=16)
ax.set_ylabel('Pct of Lives', labelpad=15, color=pctsize_color, fontsize=16)
ax.tick_params(labelsize=14)
ax.set_title('Pct of Lives calculated within '+met_sel+' Group', pad=15, color='#333333',
weight='bold', style='italic', fontsize=10)
fig.suptitle(prod_sel+' Pct of Lives & Share '+met_sel+' Group', color='#333333',
weight='bold', fontsize=20)
fig.tight_layout(pad=3)
The results (changing the colors for each) are:
NC group
Covered group
What I would like to know if how to generate a combo plot that puts both of these on one view. Ideally, I'd like to be able to get someth ign that looks like this (though in this case I'm plotting actual lives count instead of percent of lives):
Combo PLot (from Excel)
But I can't find any information on multiple bars and lines in a combo plot.
TIA for any help you can provide!

Related

overlapping two plots in matplotlib

I've two plots generated using matplotlib. The first represents my backround and the second a group of points which I want to show. Is there a way to overlap the two plots?
background:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize = (10,10))
grid_duomo = gpd.read_file('/content/Griglia_2m-SS.shp')
grid_duomo.to_crs(epsg=32632).plot(ax=ax, color='lightgrey')
points:
fig = plt.figure(figsize=(10, 10))
ids = traj_collection_df_new_app['id'].unique()
for id_ in ids:
self_id = traj_collection_df_new_app[traj_collection_df_new_app['id'] == id_]
plt.plot(
self_id['lon'],
self_id['lat'],
# markers= 'o',
# markersize=12
)
plt.plot() will always take the most recent axis found by matplotlib and use it for plotting.
Its practically the same as plt.gca().plot() where plt.gca() stands for "get current axis".
To get full control over which axis is used, you should do something like this:
(the zorder argument is used to set the "vertical stacking" of the artists, e.g. zorder=2 will be plotted on top of zorder=1)
f = plt.figure() # create a figure
ax = f.add_subplot( ... ) # create an axis in the figure f
ax.plot(..., zorder=0)
grid_duomo.plot(ax=ax, ..., zorder=1)
# you can then continue to add more axes to the same figure using
# f.add_subplot() or f.add_axes()
(if this is unclear, maybe check the quick_start guide of matplotlib? )

How to determine the matplotlib legend?

I have 3 lists to plot as curves. But every time I run the same plt lines, even with the ax.legend(loc='lower right', handles=[line1, line2, line3]), these 3 lists jumps randomly in the legend like below. Is it possible to fix their sequences and the colors for the legend as well as the curves in the plot?
EDIT:
My code is as below:
def plot_with_fixed_list(n, **kwargs):
np.random.seed(0)
fig, ax1 = plt.subplots()
my_handles = []
for key, values in kwargs.items():
value_name = key
temp, = ax1.plot(np.arange(1, n+ 1, 1).tolist(), values, label=value_name)
my_handles.append(temp)
ax1.legend(loc='lower right', handles=my_handles)
ax1.grid(True, which='both')
plt.show()
plot_with_fixed_list(300, FA_Hybrid=fa, BP=bp, Ssym_Hybrid=ssym)
This nondeterminism bug resides with python==3.5, matplotlib==3.0.0. After I updated to python==3.6, matplotlib==3.3.2, problem solved.

Pandas set labeling legend from groupby elements

I'm plotting a kde distribution of 2 dataframes on the same axis, and I need to set a legend saying which line is which dataframe. Now, this is my code:
fig, ax = plt.subplots(figsize=(15,10))
for label, df in dataframe1.groupby('ID'):
dataframe1.Value.plot(kind="kde", ax=ax,color='r')
for label, df in dataframe2.groupby('ID'):
dataframe2.Value.plot(kind='kde', ax=ax, color='b')
plt.legend()
plt.title('title here', fontsize=20)
plt.axvline(x=np.pi,color='gray',linestyle='--')
plt.xlabel('mmHg', fontsize=16)
plt.show()
But the result is this:
How can I show the legends inside the graph as 'values from df1' and 'results from df2'?
Edit:
with the following code I correctly have the question's result. But in some dataframes I get the following results:
fig, ax = plt.subplots(figsize=(15,10))
sns.kdeplot(akiPEEP['Value'], color="r", label='type 1', ax=ax)
sns.kdeplot(noAkiPEEP['Value'], color="b",label='type 2', ax=ax)
plt.legend()
plt.title('d', fontsize=20)
plt.axvline(x=np.pi,color='gray',linestyle='--')
plt.xlabel('value', fontsize=16)
plt.show()
A distribution I'm plotting now:
How do I fix this? Also, is it good to also plot the rolling means over this distribution or it becomes too heavy?
I'm not sure I understand your question, but from your code, it looks like you are trying to plot one KDE per ID value in your dataframes. In which case you would have to do:
for label, df in dataframe1.groupby('ID'):
df.Value.plot(kind="kde", ax=ax,color='r', label=label)
notice that I replaced dataframe1 by df in the body of the for-loop. df correspond to the sub-dataframe where all the elements in the column ID have value label

fig.tight_layout() but plots still overlap

Imagine I have some dataset for wines and I find the top 5 wine producing countries:
# Find top 5 wine producing countries.
top_countries = wines_df.groupby('country').size().reset_index(name='n').sort_values('n', ascending=False)[:5]['country'].tolist()
Now that I have the values, I attempt to plot the results in 10 plots, 5 rows 2 columns.
fig = plt.figure(figsize=(16, 15))
fig.tight_layout()
i = 0
for c in top_countries:
c_df = wines_df[wines_df.country == c]
i +=1
ax1 = fig.add_subplot(5,2,i)
i +=1
ax2 = fig.add_subplot(5,2,i)
sns.kdeplot(c_df['points'], ax=ax1)
ax1.set_title("POINTS OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
sns.boxplot(c_df['price'], ax=ax2)
ax2.set_title("PRICE OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
plt.show()
Even with this result, I still have my subplots overlapping.
Am I doing something wrong? Using python3.6 with matplotlib==2.2.2
As Thomas Kühn said, you have to move tight_layout() after doing the plots, like in:
fig = plt.figure(figsize=(16, 15))
i = 0
for c in top_countries:
c_df = wines_df[wines_df.country == c]
i +=1
ax1 = fig.add_subplot(5,2,i)
i +=1
ax2 = fig.add_subplot(5,2,i)
sns.kdeplot(c_df['points'], ax=ax1)
ax1.set_title("POINTS OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
sns.boxplot(c_df['price'], ax=ax2)
ax2.set_title("PRICE OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
fig.tight_layout()
plt.show()
If it is still overlapping (this may happen in some seldom cases), you can specify the padding with:
fig.tight_layout(pad=0., w_pad=0.3, h_pad=1.0)
Where pad is the general padding, w_pad is the horizontal padding and h_pad is the vertical padding. Just try some values until your plot looks nicely. (pad=0., w_pad=.3, h_pad=.3) is a good start, if you want to have your plots as tight as possible.
Another possibility is to specify constrained_layout=True in the figure:
fig = plt.figure(figsize=(16, 15), constrained_layout=True)
Now you can delete the line fig.tight_layout().
edit:
One more thing I stumbled upon:
It seems like you are specifying your figsize so that it fits on a standard DIN A4 paper in centimeters (typical textwidth: 16cm). But figsize in matplotlib is in inches. So probably replacing the figsize with figsize=(16/2.54, 15/2.54) might be better.
I know that it is absolutely confusing that matplotlib internally uses inches as units, considering that it is mostly the scientific community and data engineers working with matplotlib (and these usually use SI units). As ImportanceOfBeingErnest pointed out, there are several discussions going on about how to implement other units than inches.

Customizing subplots in matplotlib

I want to place 3 plots using subplots. Two plots on the top row and one plot that will occupy the entire second row.
My code creates a gap between the top two plots and the lower plot. How can I correct that?
df_CI
Country China India
1980 5123 8880
1981 6682 8670
1982 3308 8147
1983 1863 7338
1984 1527 5704
fig = plt.figure() # create figure
ax0 = fig.add_subplot(221) # add subplot 1 (2 row, 2 columns, first plot)
ax1 = fig.add_subplot(222) # add subplot 2 (2 row, 2 columns, second plot).
ax2 = fig.add_subplot(313) # a 3 digit number where the hundreds represent nrows, the tens represent ncols
# and the units represent plot_number.
# Subplot 1: Box plot
df_CI.plot(kind='box', color='blue', vert=False, figsize=(20, 20), ax=ax0) # add to subplot 1
ax0.set_title('Box Plots of Immigrants from China and India (1980 - 2013)')
ax0.set_xlabel('Number of Immigrants')
ax0.set_ylabel('Countries')
# Subplot 2: Line plot
df_CI.plot(kind='line', figsize=(20, 20), ax=ax1) # add to subplot 2
ax1.set_title ('Line Plots of Immigrants from China and India (1980 - 2013)')
ax1.set_ylabel('Number of Immigrants')
ax1.set_xlabel('Years')
# Subplot 3: Box plot
df_CI.plot(kind='bar', figsize=(20, 20), ax=ax2) # add to subplot 1
ax0.set_title('Box Plots of Immigrants from China and India (1980 - 2013)')
ax0.set_xlabel('Number of Immigrants')
ax0.set_ylabel('Countries')
plt.show()
I've always found subplots syntax a little difficult.
With these calls
ax0 = fig.add_subplot(221)
ax1 = fig.add_subplot(222)
you're dividing your figure in a 2x2 grid and filling the first row.
ax2 = fig.add_subplot(313)
Now you're dividing it in three rows and filling the last one.
You're basically creating two independent subplot grids, there is no easy way to define how to space subplots from one with respect to the other.
A much easier and pythonic way is using gridspec to create a single finer grid and address it with python slicing.
fig = plt.figure()
gs = mpl.gridspec.GridSpec(2, 2, wspace=0.25, hspace=0.25) # 2x2 grid
ax0 = fig.add_subplot(gs[0, 0]) # first row, first col
ax1 = fig.add_subplot(gs[0, 1]) # first row, second col
ax2 = fig.add_subplot(gs[1, :]) # full second row
And now you can also easily tune spacing with wspace and hspace.
More complex layouts are also a lot easier, it's just the familiar slicing syntax.
fig = plt.figure()
gs = mpl.gridspec.GridSpec(10, 10, wspace=0.25, hspace=0.25)
fig.add_subplot(gs[2:8, 2:8])
fig.add_subplot(gs[0, :])
for i in range(5):
fig.add_subplot(gs[1, (i*2):(i*2+2)])
fig.add_subplot(gs[2:, :2])
fig.add_subplot(gs[8:, 2:4])
fig.add_subplot(gs[8:, 4:9])
fig.add_subplot(gs[2:8, 8])
fig.add_subplot(gs[2:, 9])
fig.add_subplot(gs[3:6, 3:6])
# fancy colors
cmap = mpl.cm.get_cmap("viridis")
naxes = len(fig.axes)
for i, ax in enumerate(fig.axes):
ax.set_xticks([])
ax.set_yticks([])
ax.set_facecolor(cmap(float(i)/(naxes-1)))