I tried to plot a distribution pdf and cdf in one plot. If plot together, pdf and cdf are not matched. If plot separately, they will match. Why? You can see both green curves from same equation, but shows different shape...
def MBdist(n,loct,scale):
data = maxwell.rvs(loc=loct, scale=scale, size=n)
params = maxwell.fit(data, floc=0)
return data, params
if __name__ == '__main__':
data,para=MBdist(10000,0,0.5)
plt.subplot(211)
plt.hist(data, bins=20, normed=True)
x = np.linspace(0, 5, 20)
print x
plt.plot(x, maxwell.pdf(x, *para),'r',maxwell.cdf(x, *para), 'g')
plt.subplot(212)
plt.plot(x, maxwell.cdf(x, *para), 'g')
plt.show()
You also don't pass in an 'x' to go with the second line so it is plotting against index. It should be
plt.plot(x, maxwell.pdf(x, *para),'r',x, maxwell.cdf(x, *para), 'g')
This interface is a particularly magical bit of arg-parsing that was mimicked from MATLAB. I would suggest
fig, ax = plt.subplots()
ax.plot(x, maxwell.pdf(x, *para),'r')
ax.plot(x, maxwell.cdf(x, *para), 'g')
which while a bit more verbose line-wise is much clearer.
Related
I have 3 lists to plot as curves. But every time I run the same plt lines, even with the ax.legend(loc='lower right', handles=[line1, line2, line3]), these 3 lists jumps randomly in the legend like below. Is it possible to fix their sequences and the colors for the legend as well as the curves in the plot?
EDIT:
My code is as below:
def plot_with_fixed_list(n, **kwargs):
np.random.seed(0)
fig, ax1 = plt.subplots()
my_handles = []
for key, values in kwargs.items():
value_name = key
temp, = ax1.plot(np.arange(1, n+ 1, 1).tolist(), values, label=value_name)
my_handles.append(temp)
ax1.legend(loc='lower right', handles=my_handles)
ax1.grid(True, which='both')
plt.show()
plot_with_fixed_list(300, FA_Hybrid=fa, BP=bp, Ssym_Hybrid=ssym)
This nondeterminism bug resides with python==3.5, matplotlib==3.0.0. After I updated to python==3.6, matplotlib==3.3.2, problem solved.
Trying to plot linear regression-plot with Seaborn and I am ending up having this:
and under it these empty plots:
I don't need the last 3 small subplots, or at least how to get them plotted correctly, with the main first 3 subplots above?
Here is the code I used:
fig, axes = plt.subplots(3, 1, figsize=(12, 15))
for col, ax in zip(['gross_sqft_thousands','land_sqft_thousands','total_units'], axes.flatten()):
ax.tick_params(axis='x', rotation=85)
ax.set_ylabel(col, fontsize=15)
sns.jointplot(x="sale_price_millions", y=col, data=clean_df, kind='reg', joint_kws={'line_kws':{'color':'cyan'}}, ax=ax)
fig.suptitle('Sale Price vs Continuous Variables', position=(.5,1.02), fontsize=20)
fig.tight_layout()
fig.show()
Imagine I have some dataset for wines and I find the top 5 wine producing countries:
# Find top 5 wine producing countries.
top_countries = wines_df.groupby('country').size().reset_index(name='n').sort_values('n', ascending=False)[:5]['country'].tolist()
Now that I have the values, I attempt to plot the results in 10 plots, 5 rows 2 columns.
fig = plt.figure(figsize=(16, 15))
fig.tight_layout()
i = 0
for c in top_countries:
c_df = wines_df[wines_df.country == c]
i +=1
ax1 = fig.add_subplot(5,2,i)
i +=1
ax2 = fig.add_subplot(5,2,i)
sns.kdeplot(c_df['points'], ax=ax1)
ax1.set_title("POINTS OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
sns.boxplot(c_df['price'], ax=ax2)
ax2.set_title("PRICE OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
plt.show()
Even with this result, I still have my subplots overlapping.
Am I doing something wrong? Using python3.6 with matplotlib==2.2.2
As Thomas Kühn said, you have to move tight_layout() after doing the plots, like in:
fig = plt.figure(figsize=(16, 15))
i = 0
for c in top_countries:
c_df = wines_df[wines_df.country == c]
i +=1
ax1 = fig.add_subplot(5,2,i)
i +=1
ax2 = fig.add_subplot(5,2,i)
sns.kdeplot(c_df['points'], ax=ax1)
ax1.set_title("POINTS OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
sns.boxplot(c_df['price'], ax=ax2)
ax2.set_title("PRICE OF ALL WINES IN %s, n=%d" % (c.upper(), c_df.shape[0]), fontsize=16)
fig.tight_layout()
plt.show()
If it is still overlapping (this may happen in some seldom cases), you can specify the padding with:
fig.tight_layout(pad=0., w_pad=0.3, h_pad=1.0)
Where pad is the general padding, w_pad is the horizontal padding and h_pad is the vertical padding. Just try some values until your plot looks nicely. (pad=0., w_pad=.3, h_pad=.3) is a good start, if you want to have your plots as tight as possible.
Another possibility is to specify constrained_layout=True in the figure:
fig = plt.figure(figsize=(16, 15), constrained_layout=True)
Now you can delete the line fig.tight_layout().
edit:
One more thing I stumbled upon:
It seems like you are specifying your figsize so that it fits on a standard DIN A4 paper in centimeters (typical textwidth: 16cm). But figsize in matplotlib is in inches. So probably replacing the figsize with figsize=(16/2.54, 15/2.54) might be better.
I know that it is absolutely confusing that matplotlib internally uses inches as units, considering that it is mostly the scientific community and data engineers working with matplotlib (and these usually use SI units). As ImportanceOfBeingErnest pointed out, there are several discussions going on about how to implement other units than inches.
I would like to annotate the data points with their values next to the points on the plot. The examples I found only deal with x and y as vectors. However, I would like to do this for a pandas DataFrame that contains multiple columns.
ax = plt.figure().add_subplot(1, 1, 1)
df.plot(ax = ax)
plt.show()
What is the best way to annotate all the points for a multi-column DataFrame?
Here's a (very) slightly slicker version of Dan Allan's answer:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)},
index=list(string.ascii_lowercase[:10]))
Which gives:
x y
a 0.541974 0.042185
b 0.036188 0.775425
c 0.950099 0.888305
d 0.739367 0.638368
e 0.739910 0.596037
f 0.974529 0.111819
g 0.640637 0.161805
h 0.554600 0.172221
i 0.718941 0.192932
j 0.447242 0.172469
And then:
fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)
for k, v in df.iterrows():
ax.annotate(k, v)
Finally, if you're in interactive mode you might need to refresh the plot:
fig.canvas.draw()
Which produces:
Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:
from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0,
c=range(len(df)), colormap=cmap)
for k, v in df.iterrows():
ax.annotate(k, v,
xytext=(10,-5), textcoords='offset points',
family='sans-serif', fontsize=18, color='darkslategrey')
Which looks a lot nicer:
Do you want to use one of the other columns as the text of the annotation? This is something I did recently.
Starting with some example data
In [1]: df
Out[1]:
x y val
0 -1.015235 0.840049 a
1 -0.427016 0.880745 b
2 0.744470 -0.401485 c
3 1.334952 -0.708141 d
4 0.127634 -1.335107 e
Plot the points. I plot y against x, in this example.
ax = df.set_index('x')['y'].plot(style='o')
Write a function that loops over x, y, and the value to annotate beside the point.
def label_point(x, y, val, ax):
a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
for i, point in a.iterrows():
ax.text(point['x'], point['y'], str(point['val']))
label_point(df.x, df.y, df.val, ax)
draw()
Let's assume your df has multiple columns, and three of which are x, y, and lbl. To annotate your (x,y) scatter plot with lbl, simply:
ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);
I found the previous answers quite helpful, especially LondonRob's example that improved the layout a bit.
The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.
Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):
ax = df.plot('x', 'y', kind='scatter', s=50 )
def annotate_df(row):
ax.annotate(row.name, row.values,
xytext=(10,-5),
textcoords='offset points',
size=18,
color='darkslategrey')
_ = df.apply(annotate_df, axis=1)
Edit Notes
I edited my code example recently. Originally it used the same:
fig, ax = plt.subplots()
as the other posts to expose the axes, however this is unnecessary and makes the:
import matplotlib.pyplot as plt
line also unnecessary.
Also note:
If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
Depending on the points, you may have to play with the xytext values to get better placements.
I would like to annotate the data points with their values next to the points on the plot. The examples I found only deal with x and y as vectors. However, I would like to do this for a pandas DataFrame that contains multiple columns.
ax = plt.figure().add_subplot(1, 1, 1)
df.plot(ax = ax)
plt.show()
What is the best way to annotate all the points for a multi-column DataFrame?
Here's a (very) slightly slicker version of Dan Allan's answer:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string
df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)},
index=list(string.ascii_lowercase[:10]))
Which gives:
x y
a 0.541974 0.042185
b 0.036188 0.775425
c 0.950099 0.888305
d 0.739367 0.638368
e 0.739910 0.596037
f 0.974529 0.111819
g 0.640637 0.161805
h 0.554600 0.172221
i 0.718941 0.192932
j 0.447242 0.172469
And then:
fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)
for k, v in df.iterrows():
ax.annotate(k, v)
Finally, if you're in interactive mode you might need to refresh the plot:
fig.canvas.draw()
Which produces:
Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:
from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0,
c=range(len(df)), colormap=cmap)
for k, v in df.iterrows():
ax.annotate(k, v,
xytext=(10,-5), textcoords='offset points',
family='sans-serif', fontsize=18, color='darkslategrey')
Which looks a lot nicer:
Do you want to use one of the other columns as the text of the annotation? This is something I did recently.
Starting with some example data
In [1]: df
Out[1]:
x y val
0 -1.015235 0.840049 a
1 -0.427016 0.880745 b
2 0.744470 -0.401485 c
3 1.334952 -0.708141 d
4 0.127634 -1.335107 e
Plot the points. I plot y against x, in this example.
ax = df.set_index('x')['y'].plot(style='o')
Write a function that loops over x, y, and the value to annotate beside the point.
def label_point(x, y, val, ax):
a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
for i, point in a.iterrows():
ax.text(point['x'], point['y'], str(point['val']))
label_point(df.x, df.y, df.val, ax)
draw()
Let's assume your df has multiple columns, and three of which are x, y, and lbl. To annotate your (x,y) scatter plot with lbl, simply:
ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);
I found the previous answers quite helpful, especially LondonRob's example that improved the layout a bit.
The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.
Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):
ax = df.plot('x', 'y', kind='scatter', s=50 )
def annotate_df(row):
ax.annotate(row.name, row.values,
xytext=(10,-5),
textcoords='offset points',
size=18,
color='darkslategrey')
_ = df.apply(annotate_df, axis=1)
Edit Notes
I edited my code example recently. Originally it used the same:
fig, ax = plt.subplots()
as the other posts to expose the axes, however this is unnecessary and makes the:
import matplotlib.pyplot as plt
line also unnecessary.
Also note:
If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
Depending on the points, you may have to play with the xytext values to get better placements.