(matplotlib)Avoiding AnchoredText overlapping - matplotlib

My goal is to draw scatter plot with AnchoredText. The graph (1) below is as expected.
But the graph (2) shows the overlapped AnchoredTexts. How to make it non overlapped texts?
My data and function for drawing is:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.offsetbox import AnchoredText
data=pd.DataFrame({'val':[1, 1, 2, 2.1, 2, 2.5, 2.3],'site':['a','a','a','b','b','b','b'],
'X2':[4, 5 ,6 ,10, 10, 11, 11], 'X3':[100,100,200,200,200,300,300],
'applydate':[1101,1102,1201,1202,1204,1204,1204],
'X1':['b','b','h','b','b','h','h'] })
def my_scatter(x,y, **kwargs):
plt.scatter(x=x, y=y,**kwargs)
mx = np.mean(x);my= np.mean(y);
strText='AVG_H:'+str(np.round(mx,3))+'\nAVG_V:'+str(np.round(my,3))
textbox = AnchoredText(strText, loc='lower right', frameon=False,)
plt.gca().add_artist(textbox)
The graph (1), which is ok.
g = sns.FacetGrid(data,col='site',height=3)
g.map(my_scatter, "X2", "val",s=100, alpha=.5)
g.add_legend()
And the graph (2), which shows overlapped texts.(※The problem is that statistics for blue dots and orage dots are overlapped)
g = sns.FacetGrid(data,col='site',height=3, hue='X1') # I've assigned column X1 to hue variable.
g.map(my_scatter, "X2", "val",s=100, alpha=.5)
g.add_legend()

Related

Distinct color dots on Multidimensional Scaling plot (MDS) with plt.annotate()

Currently my MDS looks like this Instead of having the numberings on the MDS plot, I would like to replace them with dots. My desired output is that the point that is annotated as 'highest' will be a blue dot, the point that is annotated with 'lowest' will be a red dot, and all the other points will be grey dots.
Code to reproduce the MDS plot above.
import numpy as np
import scipy
import matplotlib.pyplot as plt
from sklearn.metrics import pairwise_distances #jaccard diss.
from sklearn import manifold # multidimensional scaling
foods_binary = np.random.randint(2, size=(100, 10)) #initial dataset
print(foods_binary.shape)
dis_matrix = pairwise_distances(foods_binary, metric = 'jaccard')
mds_model = manifold.MDS(n_components = 2, random_state = 123,
dissimilarity = 'precomputed')
mds_fit = mds_model.fit(dis_matrix)
mds_coords = mds_model.fit_transform(dis_matrix)
plt.figure()
plt.scatter(mds_coords[:,0],mds_coords[:,1],
facecolors = 'none', edgecolors = 'none') # points in white (invisible)
labels = [ 1, 2, 3, 4, 5, 'highest', 5, 6, 7, 8, 9, 'lowest']
for label, x, y in zip(labels, mds_coords[:,0], mds_coords[:,1]):
plt.annotate(label, (x,y), xycoords = 'data')
plt.xlabel('First Dimension')
plt.ylabel('Second Dimension')
plt.title('Dissimilarity among food items')
plt.show()
How can I achieve what I desire above? Thanks for any suggestions.

How to start Seaborn Logarithmic Barplot at y=1

I have a problem figuring out how to have Seaborn show the right values in a logarithmic barplot. A value of mine should be, in the ideal case, be 1. My dataseries (5,2,1,0.5,0.2) has a set of values that deviate from unity and I want to visualize these in a logarithmic barplot. However, when plotting this in the standard log-barplot it shows the following:
But the values under one are shown to increase from -infinity to their value, whilst the real values ought to look like this:
Strangely enough, I was unable to find a Seaborn, Pandas or Matplotlib attribute to "snap" to a different horizontal axis or "align" or ymin/ymax. I have a feeling I am unable to find it because I can't find the terms to shove down my favorite search engine. Some semi-solutions I found just did not match what I was looking for or did not have either xaxis = 1 or a ylog. A try that uses some jank Matplotlib lines:
If someone knows the right terms or a solution, thank you in advance.
Here are the Jupyter cells I used:
{1}
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
data = {'X': ['A','B','C','D','E'], 'Y': [5,2,1,0.5,0.2]}
df = pd.DataFrame(data)
{2}
%matplotlib widget
g = sns.catplot(data=df, kind="bar", y = "Y", x = "X", log = True)
{3}
%matplotlib widget
plt.vlines(x=data['X'], ymin=1, ymax=data['Y'])
You could let the bars start at 1 instead of at 0. You'll need to use sns.barplot directly.
The example code subtracts 1 of all y-values and sets the bar bottom at 1.
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import seaborn as sns
import pandas as pd
import numpy as np
data = {'X': ['A', 'B', 'C', 'D', 'E'], 'Y': [5, 2, 1, 0.5, 0.2]}
df = pd.DataFrame(data)
ax = sns.barplot(y=df["Y"] - 1, x=df["X"], bottom=1, log=True, palette='flare_r')
ax.axhline(y=1, c='k')
# change the y-ticks, as the default shows too few in this case
ax.set_yticks(np.append(np.arange(.2, .8, .1), np.arange(1, 7, 1)), minor=False)
ax.set_yticks(np.arange(.3, 6, .1), minor=True)
ax.yaxis.set_major_formatter(lambda x, pos: f'{x:.0f}' if x >= 1 else f'{x:.1f}')
ax.yaxis.set_minor_formatter(NullFormatter())
ax.bar_label(ax.containers[0], labels=df["Y"])
sns.despine()
plt.show()
PS: With these specific values, the plot might go without logscale:

Matplotlib - Line Plot

I am trying to plot an array of 101 rows * 12 Columns, with row #1 as a highlight using the code below:
plt.plot(HW.transpose()[1:101],color = 'grey', alpha = 0.1)
plt.plot(HW.transpose()[0],color = 'red', linewidth = 3, alpha = 0.7)
The only issue in this graph is that 'S1' somehow ends up in the last instead of beginning. What am I doing wrong?
HW.transpose()[1:101] doesn't select the desired columns. You can use HW.transpose().iloc[:, 1:101] instead:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
HW = pd.DataFrame(np.random.randn(101, 12).cumsum(axis=1), columns=[f'S{i}' for i in range(1, 13)])
plt.plot(HW.transpose().iloc[:, 1:101], color='grey', alpha=0.1)
plt.plot(HW.transpose().iloc[:, 0], color='red', linewidth=3, alpha=0.7)
plt.show()

how to plot lines linking medians of multiple violin distributions in seaborn?

I struggle hard to succeed in plotting a dot-line between the median values (and min and max) per type of stacked violin distributions.
I tried superposing a violin plot with a seaborn.lineplot but it failed. I'm not sure with this approach that I can draw dot-lines and also link min and max of distributions of the same type. I also tried to use seaborn.lineplot but here the challenge is to plot min and max of the distribution at each x-axis value.
Here is a example dataset and the code for the violin plot in seaborn
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
x=[0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8]
cate=['a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c']
y=[1.1,1.12,1.13,1.13,3.1,3.12,3.13,3.13,5.1,5.12,5.13,5.13,2.2,2.22,2.25,2.23,4.2,4.22,4.25,4.23,6.2,6.22,6.25,6.23,2.2,2.22,2.24,2.23,4.2,4.22,4.24,4.23,6.2,6.22,6.24,6.23,1.1,1.13,1.14,1.12,3.1,3.13,3.14,3.12,5.1,5.13,5.14,5.12]
my_pal =['red','green', 'purple']
df = pd.DataFrame({'x': x, 'Type': cate, 'y': y})
ax=sns.catplot(y='y', x='x',data=df, hue='Type', palette=my_pal, kind="violin",dodge =False)
sns.lineplot(y='y', x='x',data=df, hue='Type', palette=my_pal, ci=100,legend=False)
plt.show()
but it plots line only on a reduce part of the left of the plot. Is there a trick to superpose lineplot with violin plot?
For the line plot, 'x' is considered numerical. However, for the violin plot 'x' is considered categorical (positioned at 0, 1, 2, ...).
A solution is to convert 'x' to strings to have both plots consider it as categorical.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
my_pal = ['red', 'green', 'purple']
N = 40
df = pd.DataFrame({'x': np.random.randint(1, 6, N*3) * 0.2,
'y': np.random.uniform(0, 1, N*3) + np.tile([2, 4, 6], N),
'Type': np.tile(list('abc'), N)})
df['x'] = [f'{x:.1f}' for x in df['x']]
ax = sns.violinplot(y='y', x='x', data=df, hue='Type', palette=my_pal, dodge=False)
ax = sns.lineplot(y='y', x='x', data=df, hue='Type', palette=my_pal, ci=100, legend=False, ax=ax)
ax.margins(0.15) # slightly more padding for x and y axis
ax.legend(bbox_to_anchor=(1.01, 1), loc='upper left')
plt.tight_layout()
plt.show()

How to plot an kernel density estimation in seaborn scatterplot plot

I would like to plot the same as shown in the picture( but only the red part). The curve is a kernel density estimate based only on the X-values (the y-values are irrelevant and actually all 1,2 or 3. It is here just plotted like this to distinguish between red an blue. I have plotted the scatterplot, but how can I include the kernel density curve on the scatterplot? (the black dotted lines in the curve are just the quartiles and the median).
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.ticker import MaxNLocator
import matplotlib.pyplot as plt
from scipy.stats import norm
from sklearn.neighbors import KernelDensity
%matplotlib inline
# Change plotting style to ggplot
plt.style.use('ggplot')
from matplotlib.font_manager import FontProperties
X_plot = np.linspace(0, 30, 1000)[:, np.newaxis]
X1 = df[df['Zustandsklasse']==1]['Verweildauer'].values.reshape(-1,1)
X2 = df[df['Zustandsklasse']==2]['Verweildauer'].values.reshape(-1,1)
X3 = df[df['Zustandsklasse']==3]['Verweildauer'].values.reshape(-1,1)
#print(X1)
ax=sns.scatterplot(x="Verweildauer", y="CS_bandwith", data=df, legend="full", alpha=1)
kde=KernelDensity(kernel='gaussian').fit(X1)
log_dens = kde.score_samples(X_plot)
ax.plot(X_plot[:,0], np.exp(log_dens), color ="blue", linestyle="-", label="Gaussian Kernel")
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
ax.invert_yaxis()
plt.ylim(5.5, .5)
ax.set_ylabel("Zustandsklasse")
ax.set_xlabel("Verweildauer in Jahren")
handles, labels = ax.get_legend_handles_labels()
# create the legend again skipping this first entry
leg = ax.legend(handles[1:], labels[1:], loc="lower right", ncol=2, facecolor='silver', fontsize= 7)
ax.set_xticks(np.arange(0, 30, 5))
ax2 = ax.twinx()
#get the ticks at the same heights as the left axis
ax2.set_ylim(ax.get_ylim())
s=[(df["Zustandsklasse"] == t).sum() for t in range(1, 6)]
s.insert(0, 0)
print(s)
ax2.set_yticklabels(s)
ax2.set_ylim(ax.get_ylim())
ax2.set_ylabel("Anzahl Beobachtungen")
ax2.grid(False)
#plt.tight_layout()
plt.show()
Plotting target
Whats is plotted with the code above
It's much easier if you use subplots. Here is an example with seaborn's Titanic dataset:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
titanic = sns.load_dataset('titanic')
fig, ax = plt.subplots(nrows=3, sharex=True)
ax[2].set_xlabel('Age')
for i in [1, 2, 3]:
age_i = titanic[titanic['pclass'] == i]['age']
ax[i-1].scatter(age_i, [0] * len(age_i))
sns.kdeplot(age_i, ax=ax[i-1], shade=True, legend=False)
ax[i-1].set_yticks([])
ax[i-1].set_ylim(-0.01)
ax[i-1].set_ylabel('Class ' + str(i))