Pandas set labeling legend from groupby elements - pandas

I'm plotting a kde distribution of 2 dataframes on the same axis, and I need to set a legend saying which line is which dataframe. Now, this is my code:
fig, ax = plt.subplots(figsize=(15,10))
for label, df in dataframe1.groupby('ID'):
dataframe1.Value.plot(kind="kde", ax=ax,color='r')
for label, df in dataframe2.groupby('ID'):
dataframe2.Value.plot(kind='kde', ax=ax, color='b')
plt.legend()
plt.title('title here', fontsize=20)
plt.axvline(x=np.pi,color='gray',linestyle='--')
plt.xlabel('mmHg', fontsize=16)
plt.show()
But the result is this:
How can I show the legends inside the graph as 'values from df1' and 'results from df2'?
Edit:
with the following code I correctly have the question's result. But in some dataframes I get the following results:
fig, ax = plt.subplots(figsize=(15,10))
sns.kdeplot(akiPEEP['Value'], color="r", label='type 1', ax=ax)
sns.kdeplot(noAkiPEEP['Value'], color="b",label='type 2', ax=ax)
plt.legend()
plt.title('d', fontsize=20)
plt.axvline(x=np.pi,color='gray',linestyle='--')
plt.xlabel('value', fontsize=16)
plt.show()
A distribution I'm plotting now:
How do I fix this? Also, is it good to also plot the rolling means over this distribution or it becomes too heavy?

I'm not sure I understand your question, but from your code, it looks like you are trying to plot one KDE per ID value in your dataframes. In which case you would have to do:
for label, df in dataframe1.groupby('ID'):
df.Value.plot(kind="kde", ax=ax,color='r', label=label)
notice that I replaced dataframe1 by df in the body of the for-loop. df correspond to the sub-dataframe where all the elements in the column ID have value label

Related

How to determine the matplotlib legend?

I have 3 lists to plot as curves. But every time I run the same plt lines, even with the ax.legend(loc='lower right', handles=[line1, line2, line3]), these 3 lists jumps randomly in the legend like below. Is it possible to fix their sequences and the colors for the legend as well as the curves in the plot?
EDIT:
My code is as below:
def plot_with_fixed_list(n, **kwargs):
np.random.seed(0)
fig, ax1 = plt.subplots()
my_handles = []
for key, values in kwargs.items():
value_name = key
temp, = ax1.plot(np.arange(1, n+ 1, 1).tolist(), values, label=value_name)
my_handles.append(temp)
ax1.legend(loc='lower right', handles=my_handles)
ax1.grid(True, which='both')
plt.show()
plot_with_fixed_list(300, FA_Hybrid=fa, BP=bp, Ssym_Hybrid=ssym)
This nondeterminism bug resides with python==3.5, matplotlib==3.0.0. After I updated to python==3.6, matplotlib==3.3.2, problem solved.

How to get rid of plots under mainplot in Seaborn?

Trying to plot linear regression-plot with Seaborn and I am ending up having this:
and under it these empty plots:
I don't need the last 3 small subplots, or at least how to get them plotted correctly, with the main first 3 subplots above?
Here is the code I used:
fig, axes = plt.subplots(3, 1, figsize=(12, 15))
for col, ax in zip(['gross_sqft_thousands','land_sqft_thousands','total_units'], axes.flatten()):
ax.tick_params(axis='x', rotation=85)
ax.set_ylabel(col, fontsize=15)
sns.jointplot(x="sale_price_millions", y=col, data=clean_df, kind='reg', joint_kws={'line_kws':{'color':'cyan'}}, ax=ax)
fig.suptitle('Sale Price vs Continuous Variables', position=(.5,1.02), fontsize=20)
fig.tight_layout()
fig.show()

how to customize x-axis in matplotlib when plotting

I have a bar chart, the x-axis is (1,2,3...12).
so my bar chart is something like this:
how can I change:
1---> -6month
2---> -1 year
3--->-1.5 year
.
.
.
while showing?
my code to plot is:
dffinal = df[['6month','final-formula','Question Text','numPatients6month']].drop_duplicates().sort_values(['6month'])
df = dffinal.drop('numPatients6month', 1).groupby(['6month','Question Text']).sum().unstack('Question Text')
df.columns = df.columns.droplevel()
ax=df.plot(kind='bar', stacked=True)
import matplotlib.pyplot as plt
ax2 = ax.twinx()
plt.xticks(fontsize=8, rotation=45)
#ax2.spines['right'].set_position(('axes', 1.0))
dffinal.plot(ax=ax2,x='6month', y='numPatients6month',visible=False)
plt.title('Cognitive Impairement-Stack bar')
plt.show()
I have two df as I have two y-axis.
I tried to use replace:
dffinal['6month'].replace(1, '-6 month',inplace=True)
dffinal['6month'].replace(2, '-1 year',inplace=True)
but it just did not worked .
Thanks:)
The command plt.xticks should take care of it. Depending on whether the counting of the x axis starts from 0 (as default) or from 1 (as your plot implies) you could try:
# If x starts from 0
plt.xticks(range(12), ['-6month','-1 year',...], fontsize=8, rotation=90)
or
# If x starts from 1
plt.xticks(range(1,13), ['-6month','-1 year',...], fontsize=8, rotation=90)
In both cases replacing ['-6month','-1 year',...] by the 12 elements list of the labels you want.

how to combine two bar chart of two files in one diagram in matplotlib pandas

I have two dataframe with the same columns but different content.
I have plotted dffinal data frame. now I want to plot another dataframe dffinal_no on the same diagram to be comparable.
for example one bar chart in blue colour, and the same bar chart with another colour just differentiating in y-axis.
This is part of the code in which I have plotted the first data frame.
dffinal = df[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
ax=dffinal.plot(kind='bar',x='6month', y='final-formula')
import matplotlib.pyplot as plt
ax2 = ax.twinx()
dffinal.plot(ax=ax2,x='6month', y='numPatients6month')
plt.show()
Now imagine I have another dffinal_no data frame with the same columns, how can I plot it in the same diagram?
This is my first diagram which I plotted, I want the other bar chart on this diagram with another color.
so the answer of #Mohamed Thasin ah is somehow what I want, except that the right y-axis is not correct.
I want both data frame be based on (6month, final-formula) but the right y-axis is just showing number of patients, as an information for the user.
Actually, I DO NOT want the first df based on final-fomula and the second df be based on NumberPatients.
Update1 jast as a refrence how it looks like my data frame
dffinal = df[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
nocidffinal = nocidf[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
nocidffinal=nocidffinal.set_index('6month').sort_index()
dffinal=dffinal.set_index('6month').sort_index()
nocidffinal['final-formula'].plot(kind='bar',color='green',ax=ax1,width=width,position=0)
dffinal['numPatients6month'].plot(kind='bar',color='red',ax=ax2,width=width,position=1)
dffinal content
,6month,final-formula,numPatients6month
166047.0,1,7.794117647058823,680
82972.0,2,5.720823798627003,437
107227.0,3,5.734767025089606,558
111330.0,4,4.838709677419355,434
95591.0,5,3.3707865168539324,534
95809.0,6,3.611738148984198,443
98662.0,7,3.5523978685612785,563
192668.0,8,2.9978586723768736,467
89460.0,9,0.9708737864077669,515
192585.0,10,2.1653543307086616,508
184325.0,11,1.727447216890595,521
85068.0,12,1.0438413361169103,479
nocidffinal
,6month,final-formula,numPatients6month
137797.0,1,3.5934291581108826,974
267492.0,2,2.1705426356589146,645
269542.0,3,2.2106631989596877,769
271950.0,4,2.0,650
276638.0,5,1.5587529976019185,834
187719.0,6,1.9461077844311379,668
218512.0,7,1.1406844106463878,789
199830.0,8,0.8862629246676514,677
269469.0,9,0.3807106598984772,788
293390.0,10,0.9668508287292817,724
254783.0,11,1.2195121951219512,738
300974.0,12,0.9695290858725761,722
to compare two data frame result with bar plot one way you could try is concatenating two data frames and adding hue.
For example consider below df it contains same x and y columns in both df's and wanna compare this values. to achieve this simply add hue column for each df with differentiating constant like below.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df1=pd.DataFrame({'x':[1,2,3,4,5],'y':[10,2,454,121,34]})
df2=pd.DataFrame({'x':[4,1,2,5,3],'y':[54,12,65,12,8]})
df1['hue']=1
df2['hue']=2
res=pd.concat([df1,df2])
sns.barplot(x='x',y='y',data=res,hue='hue')
plt.show()
The result should looks like below:
To get two y-axis try this method,
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
df1=df1.set_index('x').sort_index()
df2=df2.set_index('x').sort_index()
df1['y'].plot(kind='bar',color='blue',ax=ax1,width=width,position=1)
df2['y'].plot(kind='bar',color='green',ax=ax2,width=width,position=0)
plt.show()
with actual input:
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
df1=df1.set_index('6month').sort_index()
df2=df2.set_index('6month').sort_index()
df1['final-formula'].plot(kind='bar',color='blue',ax=ax1,width=width,position=1)
df2['numPatients6month'].plot(kind='bar',color='green',ax=ax2,width=width,position=0)
plt.show()

how to plot 2 histograms side by side?

I have 2 dataframes. I want to plot a histogram based on a column 'rate' for each, side by side. How to do it?
I tried this:
import matplotlib.pyplot as plt
plt.subplot(1,2,1)
dflux.hist('rate' , bins=100)
plt.subplot(1,2,2)
dflux2.hist('rate' , bins=100)
plt.tight_layout()
plt.show()
It did not have the desired effect. It showed two blank charts then one populated chart.
Use subplots to define a figure with two axes. Then specify the axis to plot to within hist using the ax parameter.
fig, axes = plt.subplots(1, 2)
dflux.hist('rate', bins=100, ax=axes[0])
dflux2.hist('rate', bins=100, ax=axes[1])
Demo
dflux = pd.DataFrame(dict(rate=np.random.randn(10000)))
dflux2 = pd.DataFrame(dict(rate=np.random.randn(10000)))
fig, axes = plt.subplots(1, 2)
dflux.hist('rate', bins=100, ax=axes[0])
dflux2.hist('rate', bins=100, ax=axes[1])