Customize the axis label in seaborn jointplot - numpy

I seem to have got stuck at a relatively simple problem but couldn't fix it after searching for last hour and after lot of experimenting.
I have two numpy arrays x and y and I am using seaborn's jointplot to plot them:
sns.jointplot(x, y)
Now I want to label the xaxis and yaxis as "X-axis label" and "Y-axis label" respectively. If I use plt.xlabel, the labels goes to the marginal distribution. How can I make them appear on the joint axes?

sns.jointplot returns a JointGrid object, which gives you access to the matplotlib axes and you can then manipulate from there.
import seaborn as sns
import numpy as np
# example data
X = np.random.randn(1000,)
Y = 0.2 * np.random.randn(1000) + 0.5
h = sns.jointplot(X, Y)
# JointGrid has a convenience function
h.set_axis_labels('x', 'y', fontsize=16)
# or set labels via the axes objects
h.ax_joint.set_xlabel('new x label', fontweight='bold')
# also possible to manipulate the histogram plots this way, e.g.
h.ax_marg_y.grid('on') # with ugly consequences...
# labels appear outside of plot area, so auto-adjust
h.figure.tight_layout()
(The problem with your attempt is that functions such as plt.xlabel("text") operate on the current axis, which is not the central one in sns.jointplot; but the object-oriented interface is more specific as to what it will operate on).
Note that the last command uses the figure attribute of the JointGrid. The initial version of this answer used the simpler - but not object-oriented - approach via the matplotlib.pyplot interface.
To use the pyplot interface:
import matplotlib.pyplot as plt
plt.tight_layout()

Alternatively, you can specify the axes labels in a pandas DataFrame in the call to jointplot.
import pandas as pd
import seaborn as sns
x = ...
y = ...
data = pd.DataFrame({
'X-axis label': x,
'Y-axis label': y,
})
sns.jointplot(x='X-axis label', y='Y-axis label', data=data)

Related

Data visualization using Matplotlib

By using this code I'm able to generate 20 data points on y-axis corresponding to x-axis, but I want to mark the 25 data points on the line as downward pointed triangles without changing arr_x=np.linspace(0.0,5.0,20) to arr_x=np.linspace(0.0,5.0,25).
will it possible to mark additional data points on y-axis without changing x-axis ?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
def multi_curve_plot():
# Write your functionality below
fig=plt.figure(figsize=(13,4))
ax=fig.add_subplot(111)
arr_x=np.linspace(0.0,5.0,20)
arr_y1=np.array(arr_x)
arr_y2=np.array(arr_x**2)
arr_y3=np.array(arr_x**3)
ax.set(title="Linear, Quadratic, & Cubic Equations", xlabel="arr_X",
ylabel="f(arr_X)")
ax.plot(arr_x, arr_y1, label="y = arr_x", color="green", marker="v")
ax.plot(arr_x, arr_y2, label ="y = arr_x**2", color ="blue", marker="s")
ax.plot(arr_x, arr_y3, label="y = arr_x**3", color="red", marker="o")
plt.legend()
return fig
return None
multi_curve_plot()
I tried changing arr_x=np.linspace(0.0,5.0,20) to arr_x=np.linspace(0.0,5.0,25). But I want to show 25 data points on y axis without changing x-axis attributes.

Plotting fuzzy data with matplotlib

I don't know where to start, as I think it is a new approach for me. Using matplotlib with python, I would like to plot a set of fuzzy numbers (for instance a set of triangular or bell curve fuzzy numbers) as in the picture below:
You can plot the curves recurrently. My try at reproducing your example (including the superposition of labels 1 and 6):
import matplotlib.pyplot as plt
import numpy as np
# creating the figure and axis
fig, ax = plt.subplots(1,1,constrained_layout=True)
# generic gaussian
y = np.linspace(-1,1,100)
x = np.exp(-5*y**2)
center_x = (0,2,4,1,3,0,5)
center_y = (6,2,3,4,5,6,7)
# loop for all the values
for i in range(len(center_x)):
x_c, y_c = center_x[i], center_y[i]
# plotting the several bells, relocated to (x_c, y_c)
ax.plot(x + x_c,y + y_c,
color='red',linewidth=2.0)
ax.plot(x_c,y_c,
'o',color='blue',markersize=3)
# adding label
ax.annotate(
str(i+1),
(x_c - 0.1,y_c), # slight shift in x
horizontalalignment='right',
verticalalignment='center',
color='blue',
)
ax.grid()
Every call to ax.plot() is adding points or curves (to be more precise, Artists) to the same axis. The same for ax.annotate() to create the labels.

Sns barplot does not sort sliced values

I want to plot from pd df using sns barplot. Everything works fine :
code associated :
result = df.groupby(['Code departement']).size().sort_values(ascending=False)
x=result.index
y=result.values
plot=sns.barplot(x, y)
plot.set(xlabel='Code departement', ylabel='Nombre de transactions')
sns.barplot(x, y, data=df).set_title('title')
But as you can see in PLOT 1, there are too many bars so I just want the 10 highest, and when I slice x and y :
x=result[:10].index
y=result[:10].values
plot=sns.barplot(x, y)
It prints bars unordered like this :
I checked by printing x and y (sliced) and they are right ordered, Idk what I am missing thank you for your help
You didn't state the version you are using, but probably it isn't the latest. Seaborn as well as matplotlib receive quite some improvements with each new version.
With seaborn 0.11.1 you'd get a warning, as x and y is preferred to be passed via keywords, i.e. sns.barplot(x=x, y=y). The warning tries to avoid confusion with the data= keyword. Apart from that, the numeric x-values would appear sorted numerically.
The order can be controlled via the order= keyword. In this case, sns.barplot(x=x, y=y, order=x). To only have the 10 highest, you can pass sns.barplot(x=x, y=y, order=x[:10]).
Also note that you are creating the bar plot twice (just to change the title?), which can be very confusing. As sns.barplot returns the ax (the subplot onto which the plot has been drawn), the usual approach is ax = sns.barplot(...) and then ax.set_title(...). (The name ax is preferred, to easier understand how matplotlib and seaborn example code can be employed in new code.)
The following example code has been tested with seaborn 0.11.1:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
print(sns.__version__)
df = pd.DataFrame({'Code departement': np.random.randint(1, 51, 1000)})
result = df.groupby(['Code departement']).size().sort_values(ascending=False)
x = result.index
y = result.values
ax = sns.barplot(x, y, order=x[:10])
ax.set(xlabel='Code departement', ylabel='Nombre de transactions')
ax.set_title('title')
plt.show()

sns.clustermap ticks are missing

I'm trying to visualize what filters are learning in CNN text classification model. To do this, I extracted feature maps of text samples right after the convolutional layer, and for size 3 filter, I got an (filter_num)*(length_of_sentences) sized tensor.
df = pd.DataFrame(-np.random.randn(50,50), index = range(50), columns= range(50))
g= sns.clustermap(df,row_cluster=True,col_cluster=False)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
This code results in :
Where I can't see all the ticks in the y-axis. This is necessary
because I need to see which filters learn which information. Is there
any way to properly exhibit all the ticks in the y-axis?
kwargs from sns.clustermap get passed on to sns.heatmap, which has an option yticklabels, whose documentation states (emphasis mine):
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Here, the easiest option is to set it to an integer, so it will plot every n labels. We want every label, so we want to set it to 1, i.e.:
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
In your complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(-np.random.randn(50,50), index=range(50), columns=range(50))
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()

Tick labels displaying outside axis limits

Is there a way to automatically not display tick mark labels if they would protrude past the axis itself? For example, consider the following code
#!/usr/bin/python
import pylab as P, numpy as N, math as M
xvals=N.arange(-10,10,0.1)
yvals=[ M.sin(x) for x in xvals ]
P.plot( xvals, yvals )
P.show()
See how the -10 and 10 labels on the x-axis poke out to the left and right of the plot? And similar for the -1.0 and 1.0 labels on the y-axis. Can I automatically suppress plotting these but retain the ones that do not go outside the plot limits?
I think you could just format the axis ticks yourself and then prune the ones
that are hanging over. The recommended way to deal with setting up the axis is
to use the ticker API. So for example
from matplotlib.ticker import MaxNLocator
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
xvals=np.arange(-10,10,0.1)
yvals=[ np.sin(x) for x in xvals ]
ax.plot( xvals, yvals )
ax.xaxis.set_major_locator(MaxNLocator(prune='both'))
plt.show()
Here we are creating a figure and axes, plotting the data, and then setting the xaxis
major ticks. The formatter MaxNLocator is given the
argument prune='both' which is described in the docs here.
This is not exactly what you were asking for, but maybe it will solve your problem.