Imagine I have some dataset for wines and I find the top 5 wine producing countries:
# Find top 5 wine producing countries.
top_countries = wines_df.groupby('country').size().reset_index(name='n').sort_values('n', ascending=False)[:5]['country'].tolist()
Now that I have the values, I attempt to plot the results in 10 plots, 5 rows 2 columns.
fig = plt.figure(figsize=(16, 15))
fig.tight_layout()
i = 0
for c in top_countries:
c_df = wines_df[wines_df.country == c]
i +=1
ax1 = fig.add_subplot(5,2,i)
i +=1
ax2 = fig.add_subplot(5,2,i)
sns.kdeplot(c_df['points'], ax=ax1)
ax1.set_title("POINTS OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
sns.boxplot(c_df['price'], ax=ax2)
ax2.set_title("PRICE OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
plt.show()
Even with this result, I still have my subplots overlapping.
Am I doing something wrong? Using python3.6 with matplotlib==2.2.2
As Thomas Kühn said, you have to move tight_layout() after doing the plots, like in:
fig = plt.figure(figsize=(16, 15))
i = 0
for c in top_countries:
c_df = wines_df[wines_df.country == c]
i +=1
ax1 = fig.add_subplot(5,2,i)
i +=1
ax2 = fig.add_subplot(5,2,i)
sns.kdeplot(c_df['points'], ax=ax1)
ax1.set_title("POINTS OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
sns.boxplot(c_df['price'], ax=ax2)
ax2.set_title("PRICE OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
fig.tight_layout()
plt.show()
If it is still overlapping (this may happen in some seldom cases), you can specify the padding with:
fig.tight_layout(pad=0., w_pad=0.3, h_pad=1.0)
Where pad is the general padding, w_pad is the horizontal padding and h_pad is the vertical padding. Just try some values until your plot looks nicely. (pad=0., w_pad=.3, h_pad=.3) is a good start, if you want to have your plots as tight as possible.
Another possibility is to specify constrained_layout=True in the figure:
fig = plt.figure(figsize=(16, 15), constrained_layout=True)
Now you can delete the line fig.tight_layout().
edit:
One more thing I stumbled upon:
It seems like you are specifying your figsize so that it fits on a standard DIN A4 paper in centimeters (typical textwidth: 16cm). But figsize in matplotlib is in inches. So probably replacing the figsize with figsize=(16/2.54, 15/2.54) might be better.
I know that it is absolutely confusing that matplotlib internally uses inches as units, considering that it is mostly the scientific community and data engineers working with matplotlib (and these usually use SI units). As ImportanceOfBeingErnest pointed out, there are several discussions going on about how to implement other units than inches.
Related
I have 3 lists to plot as curves. But every time I run the same plt lines, even with the ax.legend(loc='lower right', handles=[line1, line2, line3]), these 3 lists jumps randomly in the legend like below. Is it possible to fix their sequences and the colors for the legend as well as the curves in the plot?
EDIT:
My code is as below:
def plot_with_fixed_list(n, **kwargs):
np.random.seed(0)
fig, ax1 = plt.subplots()
my_handles = []
for key, values in kwargs.items():
value_name = key
temp, = ax1.plot(np.arange(1, n+ 1, 1).tolist(), values, label=value_name)
my_handles.append(temp)
ax1.legend(loc='lower right', handles=my_handles)
ax1.grid(True, which='both')
plt.show()
plot_with_fixed_list(300, FA_Hybrid=fa, BP=bp, Ssym_Hybrid=ssym)
This nondeterminism bug resides with python==3.5, matplotlib==3.0.0. After I updated to python==3.6, matplotlib==3.3.2, problem solved.
I would like to perform a personalized normalization on histograms on matplotlib. In particular I have two histograms and I would like to divide each of them by a given number (number of generated events).
I don't want to "normally" normalize it, because the "normal normalization" makes the area equal to 1. What I wish for is basically to divide the value of each bin by a given number N, so that if my histogram has 2 bins, one with 5 entries and one with 3, the resulting "normalized" (or "divided") histogram would have the first bin with 5/N entries and the second one with 3/N.
I searched far&wide and found nothing really helpful. Do you have any handy solution? This is my code, working with pandas:
num_bins = 128
list_1 = dataframe_1['E']
list_2 = dataframe_2['E']
fig, ax = plt.subplots()
ax.set_xlabel('Proton energy [MeV]')
ax.set_ylabel('Normalized frequency')
ax.set_title('Proton energy distribution')
n, bins, patches = ax.hist(list_1, num_bins, density=1, alpha=0.5, color='red', ec='red', label='label_1')
n, bins, patches = ax.hist(list_2, num_bins, density=1, alpha=0.5, color='blue', ec='blue', label='label_2')
plt.legend(loc='upper center', fontsize='x-large')
fig.savefig('NiceTitle.pdf')
plt.close('all')
I want to place 3 plots using subplots. Two plots on the top row and one plot that will occupy the entire second row.
My code creates a gap between the top two plots and the lower plot. How can I correct that?
df_CI
Country China India
1980 5123 8880
1981 6682 8670
1982 3308 8147
1983 1863 7338
1984 1527 5704
fig = plt.figure() # create figure
ax0 = fig.add_subplot(221) # add subplot 1 (2 row, 2 columns, first plot)
ax1 = fig.add_subplot(222) # add subplot 2 (2 row, 2 columns, second plot).
ax2 = fig.add_subplot(313) # a 3 digit number where the hundreds represent nrows, the tens represent ncols
# and the units represent plot_number.
# Subplot 1: Box plot
df_CI.plot(kind='box', color='blue', vert=False, figsize=(20, 20), ax=ax0) # add to subplot 1
ax0.set_title('Box Plots of Immigrants from China and India (1980 - 2013)')
ax0.set_xlabel('Number of Immigrants')
ax0.set_ylabel('Countries')
# Subplot 2: Line plot
df_CI.plot(kind='line', figsize=(20, 20), ax=ax1) # add to subplot 2
ax1.set_title ('Line Plots of Immigrants from China and India (1980 - 2013)')
ax1.set_ylabel('Number of Immigrants')
ax1.set_xlabel('Years')
# Subplot 3: Box plot
df_CI.plot(kind='bar', figsize=(20, 20), ax=ax2) # add to subplot 1
ax0.set_title('Box Plots of Immigrants from China and India (1980 - 2013)')
ax0.set_xlabel('Number of Immigrants')
ax0.set_ylabel('Countries')
plt.show()
I've always found subplots syntax a little difficult.
With these calls
ax0 = fig.add_subplot(221)
ax1 = fig.add_subplot(222)
you're dividing your figure in a 2x2 grid and filling the first row.
ax2 = fig.add_subplot(313)
Now you're dividing it in three rows and filling the last one.
You're basically creating two independent subplot grids, there is no easy way to define how to space subplots from one with respect to the other.
A much easier and pythonic way is using gridspec to create a single finer grid and address it with python slicing.
fig = plt.figure()
gs = mpl.gridspec.GridSpec(2, 2, wspace=0.25, hspace=0.25) # 2x2 grid
ax0 = fig.add_subplot(gs[0, 0]) # first row, first col
ax1 = fig.add_subplot(gs[0, 1]) # first row, second col
ax2 = fig.add_subplot(gs[1, :]) # full second row
And now you can also easily tune spacing with wspace and hspace.
More complex layouts are also a lot easier, it's just the familiar slicing syntax.
fig = plt.figure()
gs = mpl.gridspec.GridSpec(10, 10, wspace=0.25, hspace=0.25)
fig.add_subplot(gs[2:8, 2:8])
fig.add_subplot(gs[0, :])
for i in range(5):
fig.add_subplot(gs[1, (i*2):(i*2+2)])
fig.add_subplot(gs[2:, :2])
fig.add_subplot(gs[8:, 2:4])
fig.add_subplot(gs[8:, 4:9])
fig.add_subplot(gs[2:8, 8])
fig.add_subplot(gs[2:, 9])
fig.add_subplot(gs[3:6, 3:6])
# fancy colors
cmap = mpl.cm.get_cmap("viridis")
naxes = len(fig.axes)
for i, ax in enumerate(fig.axes):
ax.set_xticks([])
ax.set_yticks([])
ax.set_facecolor(cmap(float(i)/(naxes-1)))
I have a subplot that plots a line (x,y) and a particular point (xx,yy). I want to highligh (xx,yy), so I've plotted it with scatter. However, even if I order it after the original plot, the new point still shows up behind the original line. How can I fix this? MWE below.
x = 1:10
y = 1:10
xx = 5
yy = 5
fig, ax = subplots()
ax[:plot](x,y)
ax[:scatter](xx,yy, color="red", label="h_star", s=100)
legend()
xlabel("x")
ylabel("y")
title("test")
grid("on")
You can change which plots are displayed on top of each other with the argument zorder. The matplotlib example shown here gives a brief explanation:
The default drawing order for axes is patches, lines, text. This
order is determined by the zorder attribute. The following defaults
are set
Artist Z-order
Patch / PatchCollection 1
Line2D / LineCollection 2
Text 3
You can change the order for individual artists by setting the zorder.
Any individual plot() call can set a value for the zorder of that
particular item.
A full example based on the code in the question, using python is shown below:
import matplotlib.pyplot as plt
x = range(1,10)
y = range(1,10)
xx = 5
yy = 5
fig, ax = plt.subplots()
ax.plot(x,y)
# could set zorder very high, say 10, to "make sure" it will be on the top
ax.scatter(xx,yy, color="red", label="h_star", s=100, zorder=3)
plt.legend()
plt.xlabel("x")
plt.ylabel("y")
plt.title("test")
plt.grid("on")
plt.show()
Which gives:
I tried to plot a distribution pdf and cdf in one plot. If plot together, pdf and cdf are not matched. If plot separately, they will match. Why? You can see both green curves from same equation, but shows different shape...
def MBdist(n,loct,scale):
data = maxwell.rvs(loc=loct, scale=scale, size=n)
params = maxwell.fit(data, floc=0)
return data, params
if __name__ == '__main__':
data,para=MBdist(10000,0,0.5)
plt.subplot(211)
plt.hist(data, bins=20, normed=True)
x = np.linspace(0, 5, 20)
print x
plt.plot(x, maxwell.pdf(x, *para),'r',maxwell.cdf(x, *para), 'g')
plt.subplot(212)
plt.plot(x, maxwell.cdf(x, *para), 'g')
plt.show()
You also don't pass in an 'x' to go with the second line so it is plotting against index. It should be
plt.plot(x, maxwell.pdf(x, *para),'r',x, maxwell.cdf(x, *para), 'g')
This interface is a particularly magical bit of arg-parsing that was mimicked from MATLAB. I would suggest
fig, ax = plt.subplots()
ax.plot(x, maxwell.pdf(x, *para),'r')
ax.plot(x, maxwell.cdf(x, *para), 'g')
which while a bit more verbose line-wise is much clearer.