Annotate text in facetgrid of sns relplot in python - pandas

Using the following data frame (utilities):
Security_Name Rating Duracion Spread
0 COLBUN 3.95 10/11/27 BBB 6.135749 132
1 ENELGX 4 1/4 04/15/24 BBB+ 3.197206 124
2 PROMIG 3 3/4 10/16/29 BBB- 7.628048 243
3 IENOVA 4 3/4 01/15/51 BBB 15.911632 364
4 KALLPA 4 7/8 05/24/26 BBB- 4.792474 241
5 TGPERU 4 1/4 04/30/28 BBB+ 4.935607 130
dataframe
I am trying to create a sns relplot which should annotate the scatter plot points in respective facetgrid. However the out put i get looks something like this(without the annotations)
relplot
I can't see any annotation in any plot
I have tried the following code:
sns.relplot(x="Duracion", y="Spread", col="Rating", data=utilities)
I really don't know where to start to bring the annotations for this replot using facetrgid. The annotation should be the values of the column Security_Name
please advise the modifications. thanks in advance.

Using FacetGrid and a custom annotation function, you can get the desired result. Note that there is a good chance the annotation will overlap given the example dataframe provided:
def annotate_points(x,y,t, **kwargs):
ax = plt.gca()
data = kwargs.pop('data')
for i,row in data.iterrows():
ax.annotate(row[t], xy=(row[x],row[y]))
g = sns.FacetGrid(col="Rating", data=df)
g.map(sns.scatterplot, "Duracion", "Spread")
g.map_dataframe(annotate_points, "Duracion", "Spread", 'Security_Name')

Related

Python scatter plot vs line plot and column values

Wondering if anyone could clarify this for me.
Basically, I have a dataframe that looks like this:
Data_Value
Month_Day
01-01 1.1
01-02 3.9
01-03 3.9
01-04 4.4
I can generate a line plot based on this dataframe using this code:
ax.plot(df.values)
I have had some problems generating a scatter plot from the same data frame and I am wondering if it's possible given that there is a "-" in the index column of the dataframe. However, I am also thinking that since it's possible to generate a line plot it should also be possible to do a scatter plot?
Any insights would be most welcome.
When I try this code:
df = df.reset_index()
df['Month_Day'] = pd.to_datetime(df['Month_Day'], format='%m-%d')
df.plot(type='scatter',x='Month_Day',y='Data_Value')
I get this error msg:
AttributeError: Unknown property type
My Pandas version: 0.19.2
Not sure if I understood your issue totally, but if its just to create scatter plots, you can try to reset the index to convert 'Month_Date' to a regular column and also convert it to datetime. I tried the following:
df.reset_index(inplace=True)
df['Month_Day'] = pd.to_datetime(df['Month_Day'], format='%m-%d')
# you can replace the year with any value, using 2020 as an example
df['Month_Day'] = [val.replace(year=2020) for val in df['Month_Day']]
print(df)
Output:
Month_Day Data_Value
0 2020-01-01 1.1
1 2020-01-02 3.9
2 2020-01-03 3.9
3 2020-01-04 4.4
Then generate a scatter plot:
import matplotlib.pyplot as plt
# generate the plot
plt.scatter(df['Month_Day'], df['Data_Value'])
plt.show()
You can do it, but I believe you have to have 'Month_Day' in the columns so you reset the index.
df = df.reset_index()
df.plot(kind='scatter',x='Month_Day',y='Data_Value')
Result:

How to show multiple timeseries plots using seaborn

I'm trying to generate 4 plots from a DataFrame using Seaborn
Date A B C D
2019-04-05 330.665 161.975 168.69 0
2019-04-06 322.782 150.243 172.539 0
2019-04-07 322.782 150.243 172.539 0
2019-04-08 295.918 127.801 168.117 0
2019-04-09 282.674 126.894 155.78 0
2019-04-10 293.818 133.413 160.405 0
I have casted dates using pd.to_DateTime and numbers using pd.to_numeric. Here is the df.info():
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 460 to 465
Data columns (total 5 columns):
Date 6 non-null datetime64[ns]
A 6 non-null float64
B 6 non-null float64
C 6 non-null float64
D 6 non-null float64
dtypes: datetime64[ns](1), float64(4)
memory usage: 288.0 bytes
I can do a wide column plot by just calling .plot() on df.
However,
The legend of the plot is covering the plot itself
I would instead like to have 4 separate plots in 1 diagram and have tried using lmplot to achieve this.
I would like to add labels to the plot like so:
Plot with image
I first melted the data:
df=pd.melt(df,id_vars='Date', var_name='Var', value_name='Unit')
And then tried lmplot
sns.lmplot(x = df['Date'], y='Unit', col='Var', data=df)
However, I get the traceback:
TypeError: Invalid comparison between dtype=datetime64[ns] and str
I have also tried setting df.set_index['Date'] and replotting that using x=df.index and that gave me the same error.
The data can be plotted using Google Sheets but I am trying to automate a workflow where the chart can be generated and sent via Slack to selected recipients.
I hope I have expressed myself clearly enough as I am rather new to Python and Seaborn and hope to get some help from the experts here.
Regarding the legend you can just use .legend(loc="upper left", bbox_to_anchor=(1,1)) as in this example
%matplotlib inline
import pandas as pd
import numpy as np
data = np.random.rand(10,4)
df = pd.DataFrame(data, columns=["A", "B", "C", "D"])
df.plot()\
.legend(loc="upper left", bbox_to_anchor=(1,1));
While for the second IIUC you can play from
df.plot(subplots=True, layout=(2,2));

how to plot a dataframe with two different axes in pandas matplotlib

So my data frame is like this:
6month final-formula numPatients6month
160243.0 1 0.401193 417
172110.0 2 0.458548 323
157638.0 3 0.369403 268
180306.0 4 0.338761 238
175324.0 5 0.247011 237
170709.0 6 0.328555 218
195762.0 7 0.232895 190
172571.0 8 0.319588 194
172055.0 9 0.415517 145
174609.0 10 0.344697 132
174089.0 11 0.402965 106
196130.0 12 0.375000 80
and I am plotting 6month, final-formula column
dffinal.plot(kind='bar',x='6month', y='final-formula')
import matplotlib.pyplot as plt
plt.show()
till now its ok, it shows 6month in the x axis and final-formula in the y-axis.
what I want is that to show the numPatients6month in the same plot, but in another y axis.
according to the below diagram. I want to show numPatients6month in the position 1, or simply show that number on above each bar.
I tried to conduct that by twinx, but it seems it is for the case we have two plot and we want to plot it in the same figure.
fig = plt.figure()
ax = fig.add_subplot(111)
ax2 = ax.twinx()
ax.set_ylabel('numPatients6month')
I appreciate your help :)
This is the solution that resolved it.I share here may help someone :)
ax=dffinal.plot(kind='bar',x='6month', y='final-formula')
import matplotlib.pyplot as plt
ax2 = ax.twinx()
ax2.spines['right'].set_position(('axes', 1.0))
dffinal.plot(ax=ax2,x='6month', y='numPatients6month')
plt.show()
Store the AxesSubplot in a variable called ax
ax = dffinal.plot(kind='bar',x='6month', y='final-formula')
and then
ax.tick_params(labeltop=False, labelright=True)
This will, bring the labels to the right as well.
Is this enough, or would you like to also know how to add values to the top of the bars? Because your question indicated, one of the two would satisfy.

How to add an extra number on top of the each bar on barchart

According to the explanation why this question is different from this link
this link get the height from the diagram as far as I understood, but in my case I do not have this column numpatients6month in the diagram at all, I just have that on the data frame.
So I have a bar chart. It contains two bar for each x-axis in which each bar read from different data frame.
this is the code I am plotting the bar chart.
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['axes.prop_cycle'] = ("cycler('color', 'rg')")
dffinal['CI-noCI']='Cognitive Impairement'
nocidffinal['CI-noCI']='Non Cognitive Impairement'
res=pd.concat([dffinal,nocidffinal])
sns.barplot(x='6month',y='final-formula',data=res,hue='CI-noCI').set_title(fs)
plt.xticks(fontsize=8, rotation=45)
plt.show()
as you see there is two data frame. I plot dffinal with color green and nocidffinal with color red.
This is the result of plot:
Some more explanation: dffinal is based on (6month, final-formula) nocidffinal is also based on(6month,final-formula).
this is my nocidffinal data frame:
6month final-formula numPatients6month
137797.0 1 0.035934 974
267492.0 2 0.021705 645
269542.0 3 0.022107 769
271950.0 4 0.020000 650
276638.0 5 0.015588 834
187719.0 6 0.019461 668
218512.0 7 0.011407 789
199830.0 8 0.008863 677
269469.0 9 0.003807 788
293390.0 10 0.009669 724
254783.0 11 0.012195 738
300974.0 12 0.009695 722
and dffinal:
6month final-formula numPatients6month
166047.0 1 0.077941 680
82972.0 2 0.057208 437
107227.0 3 0.057348 558
111330.0 4 0.048387 434
95591.0 5 0.033708 534
95809.0 6 0.036117 443
98662.0 7 0.035524 563
192668.0 8 0.029979 467
89460.0 9 0.009709 515
192585.0 10 0.021654 508
184325.0 11 0.017274 521
85068.0 12 0.010438 479
As you see there is column numPatients6month in this dataframeS which I would like to show on top of each bar.
I do NOT want to change the barchart and group it based on this column, rather I want to just show this number as extra information to the user on top of each bar.
thanks for your time :)
If you get your numPatients6month columns in one iterable and in order they appear in chart then using the other stackoverflow answer (also in the docs here) you can place the text on top correctly.
I used code below (adapted from this SO answer). It combines multiple columns one row after another (i.e. will get all your numPatients6month columns in the chart order)
vals = pd.concat([nocidffinal.numPatients6month, dffinal.numPatients6month], axis=1)
vals = vals.stack().reset_index(level=[0,1], drop=True)
This is my full code
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['axes.prop_cycle'] = ("cycler('color', 'rg')")
dffinal['CI-noCI']='Cognitive Impairement'
nocidffinal['CI-noCI']='Non Cognitive Impairement'
res=pd.concat([dffinal,nocidffinal])
# Copied to clipboard from SO question above
# Comment out if you already have your dataframes
nocidffinal = pd.read_clipboard().reset_index()
dffinal = pd.read_clipboard().reset_index()
# This will merge columns in order of the chart
vals = pd.concat([nocidffinal.numPatients6month, dffinal.numPatients6month], axis=1)
vals = vals.stack().reset_index(level=[0,1], drop=True)
# Plot the chart
ax = sns.barplot(x='6month', y='final-formula', data=res, hue='CI-noCI')
_ = plt.xticks(fontsize=8, rotation=45)
# Add the values on top of each correct bar
for idx, p in enumerate(ax.patches):
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2.,
height + height*.01,
vals[idx],
ha="center")

Scatter plot of Multiindex GroupBy()

I'm trying to make a scatter plot of a GroupBy() with Multiindex (http://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-with-multiindex). That is, I want to plot one of the labels on the x-axis, another label on the y-axis, and the mean() as the size of each point.
df['RMSD'].groupby([df['Sigma'],df['Epsilon']]).mean() returns:
Sigma_ang Epsilon_K
3.4 30 0.647000
40 0.602071
50 0.619786
3.6 30 0.646538
40 0.591833
50 0.607769
3.8 30 0.616833
40 0.590714
50 0.578364
Name: RMSD, dtype: float64
And I'd like to to plot something like: plt.scatter(x=Sigma, y=Epsilon, s=RMSD)
What's the best way to do this? I'm having trouble getting the proper Sigma and Epsilon values for each RMSD value.
+1 to Vaishali Garg. Based on his comment, the following works:
df_mean = df['RMSD'].groupby([df['Sigma'],df['Epsilon']]).mean().reset_index()
plt.scatter(df_mean['Sigma'], df_mean['Epsilon'], s=100.*df_mean['RMSD'])