How to get rid of plots under mainplot in Seaborn? - matplotlib

Trying to plot linear regression-plot with Seaborn and I am ending up having this:
and under it these empty plots:
I don't need the last 3 small subplots, or at least how to get them plotted correctly, with the main first 3 subplots above?
Here is the code I used:
fig, axes = plt.subplots(3, 1, figsize=(12, 15))
for col, ax in zip(['gross_sqft_thousands','land_sqft_thousands','total_units'], axes.flatten()):
ax.tick_params(axis='x', rotation=85)
ax.set_ylabel(col, fontsize=15)
sns.jointplot(x="sale_price_millions", y=col, data=clean_df, kind='reg', joint_kws={'line_kws':{'color':'cyan'}}, ax=ax)
fig.suptitle('Sale Price vs Continuous Variables', position=(.5,1.02), fontsize=20)
fig.tight_layout()
fig.show()

Related

Directly annotate matplotlib stacked bar graph [duplicate]

This question already has answers here:
Annotate bars with values on Pandas bar plots
(4 answers)
Closed 1 year ago.
I would like to create an annotation to a bar chart that compares the value of the bar to two reference values. An overlay such as shown in the picture, a kind of staff gauge, is possible, but I'm open to more elegant solutions.
The bar chart is generated with the pandas API to matplotlib (e.g. data.plot(kind="bar")), so a plus would be if the solution is playing nicely with that.
You may use smaller bars for the target and benchmark indicators. Pandas cannot annotate bars automatically, but you can simply loop over the values and use matplotlib's pyplot.annotate instead.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = np.random.randint(5,15, size=5)
t = (a+np.random.normal(size=len(a))*2).round(2)
b = (a+np.random.normal(size=len(a))*2).round(2)
df = pd.DataFrame({"a":a, "t":t, "b":b})
fig, ax = plt.subplots()
df["a"].plot(kind='bar', ax=ax, legend=True)
df["b"].plot(kind='bar', position=0., width=0.1, color="lightblue",legend=True, ax=ax)
df["t"].plot(kind='bar', position=1., width=0.1, color="purple", legend=True, ax=ax)
for i, rows in df.iterrows():
plt.annotate(rows["a"], xy=(i, rows["a"]), rotation=0, color="C0")
plt.annotate(rows["b"], xy=(i+0.1, rows["b"]), color="lightblue", rotation=+20, ha="left")
plt.annotate(rows["t"], xy=(i-0.1, rows["t"]), color="purple", rotation=-20, ha="right")
ax.set_xlim(-1,len(df))
plt.show()
There's no direct way to annotate a bar plot (as far as I am aware) Some time ago I needed to annotate one so I wrote this, perhaps you can adapt it to your needs.
import matplotlib.pyplot as plt
import numpy as np
ax = plt.subplot(111)
ax.set_xlim(-0.2, 3.2)
ax.grid(b=True, which='major', color='k', linestyle=':', lw=.5, zorder=1)
# x,y data
x = np.arange(4)
y = np.array([5, 12, 3, 7])
# Define upper y limit leaving space for the text above the bars.
up = max(y) * .03
ax.set_ylim(0, max(y) + 3 * up)
ax.bar(x, y, align='center', width=0.2, color='g', zorder=4)
# Add text to bars
for xi, yi, l in zip(*[x, y, list(map(str, y))]):
ax.text(xi - len(l) * .02, yi + up, l,
bbox=dict(facecolor='w', edgecolor='w', alpha=.5))
ax.set_xticks(x)
ax.set_xticklabels(['text1', 'text2', 'text3', 'text4'])
ax.tick_params(axis='x', which='major', labelsize=12)
plt.show()

How to determine the matplotlib legend?

I have 3 lists to plot as curves. But every time I run the same plt lines, even with the ax.legend(loc='lower right', handles=[line1, line2, line3]), these 3 lists jumps randomly in the legend like below. Is it possible to fix their sequences and the colors for the legend as well as the curves in the plot?
EDIT:
My code is as below:
def plot_with_fixed_list(n, **kwargs):
np.random.seed(0)
fig, ax1 = plt.subplots()
my_handles = []
for key, values in kwargs.items():
value_name = key
temp, = ax1.plot(np.arange(1, n+ 1, 1).tolist(), values, label=value_name)
my_handles.append(temp)
ax1.legend(loc='lower right', handles=my_handles)
ax1.grid(True, which='both')
plt.show()
plot_with_fixed_list(300, FA_Hybrid=fa, BP=bp, Ssym_Hybrid=ssym)
This nondeterminism bug resides with python==3.5, matplotlib==3.0.0. After I updated to python==3.6, matplotlib==3.3.2, problem solved.

how to put 3 seaborn scatter plots under one another?

I want to combine all 3 seaborn scatter plots under one "frame".
plt.figure(figsize=(7,15))
plt.subplots(3,1)
sns.scatterplot(x=train['Garage Area'], y=train['SalePrice'])
plt.show()
sns.scatterplot(x=train['Gr Liv Area'], y=train['SalePrice'])
plt.show()
sns.scatterplot(x=train['Overall Cond'], y=train['SalePrice'])
plt.show()
But it creates 5, the first 3 are small according to (7,15) size but the last 2 are different.
I suspect it should be
plt.figure(figsize=(7,15))
fig,ax = plt.subplots(3,1)
ax[0] = fig.add_subplot(sns.scatterplot(x=train['Garage Area'], y=train['SalePrice']))
#plt.show()
ax[1] = fig.add_subplot(sns.scatterplot(x=train['Gr Liv Area'], y=train['SalePrice']))
#plt.show()
ax[2] =fig.add_subplot(sns.scatterplot(x=train['Overall Cond'], y=train['SalePrice']))
plt.show()
but all 3 plots are stuck in the last 3rd chart!
The following is one way to do it:
Create a figure with 3 subplots (3 rows, 1 column)
Pass the respective subplot using ax[0], ax[1] and ax[2] to the three separate sns.scatterplot commands using the keyword ax
fig, ax = plt.subplots(3, 1, figsize=(7,15))
sns.scatterplot(x=train['Garage Area'], y=train['SalePrice'], ax=ax[0])
sns.scatterplot(x=train['Gr Liv Area'], y=train['SalePrice'], ax=ax[1])
sns.scatterplot(x=train['Overall Cond'], y=train['SalePrice'], ax=ax[2])
plt.show()

Pandas set labeling legend from groupby elements

I'm plotting a kde distribution of 2 dataframes on the same axis, and I need to set a legend saying which line is which dataframe. Now, this is my code:
fig, ax = plt.subplots(figsize=(15,10))
for label, df in dataframe1.groupby('ID'):
dataframe1.Value.plot(kind="kde", ax=ax,color='r')
for label, df in dataframe2.groupby('ID'):
dataframe2.Value.plot(kind='kde', ax=ax, color='b')
plt.legend()
plt.title('title here', fontsize=20)
plt.axvline(x=np.pi,color='gray',linestyle='--')
plt.xlabel('mmHg', fontsize=16)
plt.show()
But the result is this:
How can I show the legends inside the graph as 'values from df1' and 'results from df2'?
Edit:
with the following code I correctly have the question's result. But in some dataframes I get the following results:
fig, ax = plt.subplots(figsize=(15,10))
sns.kdeplot(akiPEEP['Value'], color="r", label='type 1', ax=ax)
sns.kdeplot(noAkiPEEP['Value'], color="b",label='type 2', ax=ax)
plt.legend()
plt.title('d', fontsize=20)
plt.axvline(x=np.pi,color='gray',linestyle='--')
plt.xlabel('value', fontsize=16)
plt.show()
A distribution I'm plotting now:
How do I fix this? Also, is it good to also plot the rolling means over this distribution or it becomes too heavy?
I'm not sure I understand your question, but from your code, it looks like you are trying to plot one KDE per ID value in your dataframes. In which case you would have to do:
for label, df in dataframe1.groupby('ID'):
df.Value.plot(kind="kde", ax=ax,color='r', label=label)
notice that I replaced dataframe1 by df in the body of the for-loop. df correspond to the sub-dataframe where all the elements in the column ID have value label

how to plot 2 histograms side by side?

I have 2 dataframes. I want to plot a histogram based on a column 'rate' for each, side by side. How to do it?
I tried this:
import matplotlib.pyplot as plt
plt.subplot(1,2,1)
dflux.hist('rate' , bins=100)
plt.subplot(1,2,2)
dflux2.hist('rate' , bins=100)
plt.tight_layout()
plt.show()
It did not have the desired effect. It showed two blank charts then one populated chart.
Use subplots to define a figure with two axes. Then specify the axis to plot to within hist using the ax parameter.
fig, axes = plt.subplots(1, 2)
dflux.hist('rate', bins=100, ax=axes[0])
dflux2.hist('rate', bins=100, ax=axes[1])
Demo
dflux = pd.DataFrame(dict(rate=np.random.randn(10000)))
dflux2 = pd.DataFrame(dict(rate=np.random.randn(10000)))
fig, axes = plt.subplots(1, 2)
dflux.hist('rate', bins=100, ax=axes[0])
dflux2.hist('rate', bins=100, ax=axes[1])