sns.clustermap ticks are missing - matplotlib

I'm trying to visualize what filters are learning in CNN text classification model. To do this, I extracted feature maps of text samples right after the convolutional layer, and for size 3 filter, I got an (filter_num)*(length_of_sentences) sized tensor.
df = pd.DataFrame(-np.random.randn(50,50), index = range(50), columns= range(50))
g= sns.clustermap(df,row_cluster=True,col_cluster=False)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
This code results in :
Where I can't see all the ticks in the y-axis. This is necessary
because I need to see which filters learn which information. Is there
any way to properly exhibit all the ticks in the y-axis?

kwargs from sns.clustermap get passed on to sns.heatmap, which has an option yticklabels, whose documentation states (emphasis mine):
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Here, the easiest option is to set it to an integer, so it will plot every n labels. We want every label, so we want to set it to 1, i.e.:
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
In your complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(-np.random.randn(50,50), index=range(50), columns=range(50))
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()

Related

Loop over columns in a dataframe to produce histograms by category

I would like to investigate the frequency distribution of all the features (columns) in my df based on the outcome variable (target column). Having searched the solutions, I find this beautiful snippet from here which loop over features and generate histograms for features in the cancer dataset from Scikit-learn.
import numpy as np
import matplotlib.pyplot as plt
# from matplotlib.pyplot import matplotlib
fig,axes =plt.subplots(10,3, figsize=(12, 9)) # 3 columns each containing 10 figures, total 30 features
malignant=cancer.data[cancer.target==0] # define malignant
benign=cancer.data[cancer.target==1] # define benign
ax=axes.ravel()# flat axes with numpy ravel
for i in range(30):
_,bins=np.histogram(cancer.data[:,i],bins=40)
ax[i].hist(malignant[:,i],bins=bins,color='r',alpha=.5)
ax[i].hist(benign[:,i],bins=bins,color='g',alpha=0.3)
ax[i].set_title(cancer.feature_names[i],fontsize=9)
ax[i].axes.get_xaxis().set_visible(False) # the x-axis co-ordinates are not so useful, as we just want to look how well separated the histograms are
ax[i].set_yticks(())
ax[0].legend(['malignant','benign'],loc='best',fontsize=8)
plt.tight_layout()# let's make good plots
plt.show()
Assuming that I have df with all features and target variable organised across successive columns, how would I be able to loop over my columns to reproduce histograms. One solution that I have considered is a groupby method.
df.groupby("class").col01.plot(kind='kde', ax=axs[1])
Any ideas are much appreciated!
Actually you can use sns.FacetGrid for this:
# Random data:
np.random.seed(1)
df = pd.DataFrame(np.random.uniform(0,1,(100,6)), columns=list('ABCDEF'))
df['class'] = np.random.choice([0,1], p=[0.3,0.7], size=len(df))
# plot
g = sns.FacetGrid(df.melt(id_vars='class'),
col='variable',
hue='class',
col_wrap=3) # change this to your liking
g = g.map(sns.kdeplot, "value", alpha=0.5)
Output:

plotting graph of 3 parameters (PosX ,PosY) vs Time .It is a timeseries data

I am new to this module. I have time series data for movement of particle against time. The movement has its X and Y component against the the time T. I want to plot these 3 parameters in the graph. The sample data looks like this. The first coloumn represent time, 2nd- Xcordinate , 3rd Y-cordinate.
1.5193 618.3349 487.5595
1.5193 619.3349 487.5595
2.5193 619.8688 489.5869
2.5193 620.8688 489.5869
3.5193 622.9027 493.3156
3.5193 623.9027 493.3156
If you want to add a 3rd info to a 2D curve, one possibility is to use a color mapping instituting a relationship between the value of the 3rd coordinate and a set of colors.
In Matplotlib we have not a direct way of plotting a curve with changing color, but we can fake one using matplotlib.collections.LineCollection.
In the following I've used some arbitrary curve but I have no doubt that you could adjust my code to your particular use case if my code suits your needs.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
# e.g., a Lissajous curve
t = np.linspace(0, 2*np.pi, 6280)
x, y = np.sin(4*t), np.sin(5*t)
# to use LineCollection we need an array of segments
# the canonical answer (to upvote...) is https://stackoverflow.com/a/58880037/2749397
points = np.array([x, y]).T.reshape(-1,1,2)
segments = np.concatenate([points[:-1],points[1:]], axis=1)
# instantiate the line collection with appropriate parameters,
# the associated array controls the color mapping, we set it to time
lc = LineCollection(segments, cmap='nipy_spectral', linewidth=6, alpha=0.85)
lc.set_array(t)
# usual stuff, just note ax.autoscale, not needed here because we
# replot the same data but tipically needed with ax.add_collection
fig, ax = plt.subplots()
plt.xlabel('x/mm') ; plt.ylabel('y/mm')
ax.add_collection(lc)
ax.autoscale()
cb = plt.colorbar(lc)
cb.set_label('t/s')
# we plot a thin line over the colormapped line collection, especially
# useful when our colormap contains white...
plt.plot(x, y, color='black', linewidth=0.5, zorder=3)
plt.show()

Customize the axis label in seaborn jointplot

I seem to have got stuck at a relatively simple problem but couldn't fix it after searching for last hour and after lot of experimenting.
I have two numpy arrays x and y and I am using seaborn's jointplot to plot them:
sns.jointplot(x, y)
Now I want to label the xaxis and yaxis as "X-axis label" and "Y-axis label" respectively. If I use plt.xlabel, the labels goes to the marginal distribution. How can I make them appear on the joint axes?
sns.jointplot returns a JointGrid object, which gives you access to the matplotlib axes and you can then manipulate from there.
import seaborn as sns
import numpy as np
# example data
X = np.random.randn(1000,)
Y = 0.2 * np.random.randn(1000) + 0.5
h = sns.jointplot(X, Y)
# JointGrid has a convenience function
h.set_axis_labels('x', 'y', fontsize=16)
# or set labels via the axes objects
h.ax_joint.set_xlabel('new x label', fontweight='bold')
# also possible to manipulate the histogram plots this way, e.g.
h.ax_marg_y.grid('on') # with ugly consequences...
# labels appear outside of plot area, so auto-adjust
h.figure.tight_layout()
(The problem with your attempt is that functions such as plt.xlabel("text") operate on the current axis, which is not the central one in sns.jointplot; but the object-oriented interface is more specific as to what it will operate on).
Note that the last command uses the figure attribute of the JointGrid. The initial version of this answer used the simpler - but not object-oriented - approach via the matplotlib.pyplot interface.
To use the pyplot interface:
import matplotlib.pyplot as plt
plt.tight_layout()
Alternatively, you can specify the axes labels in a pandas DataFrame in the call to jointplot.
import pandas as pd
import seaborn as sns
x = ...
y = ...
data = pd.DataFrame({
'X-axis label': x,
'Y-axis label': y,
})
sns.jointplot(x='X-axis label', y='Y-axis label', data=data)

Second Matplotlib figure doesn't save to file

I've drawn a plot that looks something like the following:
It was created using the following code:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
# 1. Plot a figure consisting of 3 separate axes
# ==============================================
plotNames = ['Plot1','Plot2','Plot3']
figure, axisList = plt.subplots(len(plotNames), sharex=True, sharey=True)
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range('2015-01-01','2015-12-31',freq='D')
tempDF['value'] = np.random.randn(tempDF['date'].size)
tempDF['value2'] = np.random.randn(tempDF['date'].size)
for i in range(len(plotNames)):
axisList[i].plot_date(tempDF['date'],tempDF['value'],'b-',xdate=True)
# 2. Create a new single axis in the figure. This new axis sits over
# the top of the axes drawn previously. Make all the components of
# the new single axis invisibe except for the x and y labels.
big_ax = figure.add_subplot(111)
big_ax.set_axis_bgcolor('none')
big_ax.set_xlabel('Date',fontweight='bold')
big_ax.set_ylabel('Random normal',fontweight='bold')
big_ax.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
big_ax.spines['right'].set_visible(False)
big_ax.spines['top'].set_visible(False)
big_ax.spines['left'].set_visible(False)
big_ax.spines['bottom'].set_visible(False)
# 3. Plot a separate figure
# =========================
figure2,ax2 = plt.subplots()
ax2.plot_date(tempDF['date'],tempDF['value2'],'-',xdate=True,color='green')
ax2.set_xlabel('Date',fontweight='bold')
ax2.set_ylabel('Random normal',fontweight='bold')
# Save plot
# =========
plt.savefig('tempPlot.png',dpi=300)
Basically, the rationale for plotting the whole picture is as follows:
Create the first figure and plot 3 separate axes using a loop
Plot a single axis in the same figure to sit on top of the graphs
drawn previously. Label the x and y axes. Make all other aspects of
this axis invisible.
Create a second figure and plot data on a single axis.
The plot displays just as I want when using jupyter-notebook but when the plot is saved, the file contains only the second figure.
I was under the impression that plots could have multiple figures and that figures could have multiple axes. However, I suspect I have a fundamental misunderstanding of the differences between plots, subplots, figures and axes. Can someone please explain what I'm doing wrong and explain how to get the whole image to save to a single file.
Matplotlib does not have "plots". In that sense,
plots are figures
subplots are axes
During runtime of a script you can have as many figures as you wish. Calling plt.save() will save the currently active figure, i.e. the figure you would get by calling plt.gcf().
You can save any other figure either by providing a figure number num:
plt.figure(num)
plt.savefig("output.png")
or by having a refence to the figure object fig1
fig1.savefig("output.png")
In order to save several figures into one file, one could go the way detailed here: Python saving multiple figures into one PDF file.
Another option would be not to create several figures, but a single one, using subplots,
fig = plt.figure()
ax = plt.add_subplot(611)
ax2 = plt.add_subplot(612)
ax3 = plt.add_subplot(613)
ax4 = plt.add_subplot(212)
and then plot the respective graphs to those axes using
ax.plot(x,y)
or in the case of a pandas dataframe df
df.plot(x="column1", y="column2", ax=ax)
This second option can of course be generalized to arbitrary axes positions using subplots on grids. This is detailed in the matplotlib user's guide Customizing Location of Subplot Using GridSpec
Furthermore, it is possible to position an axes (a subplot so to speak) at any position in the figure using fig.add_axes([left, bottom, width, height]) (where left, bottom, width, height are in figure coordinates, ranging from 0 to 1).

secondary Y axis position matplotlib

I need to change the secondary Y axis position on a matplotlib plot.
It's like a subplot inside the same plot.
In the image below, my secondary Y axis starts at the same position as first y axis. I need that the secondary Y axis starts about at the "18" position of the first Y axis, with a smaller scale (red line).
If I understand the question, you want a twinx axis, as #kikocorreoso says, but you also want to compress it, so it only takes up the upper portion of the y axis.
You can do this by just setting the ylim larger than you need it, and explicitly setting the yticks. Here's an example with some random data
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.normal(np.random.randint(0,5),4,25) for _ in range(25)] # some random data
fig=plt.figure()
ax1=fig.add_subplot(111)
ax2=ax1.twinx()
ax1.set_ylim(-5,25)
ax2.set_ylim(0,14)
ax2.set_yticks([10,12,14]) # ticks below 10 don't show up
ax1.boxplot(data)
ax2.plot(np.linspace(0,26,50),12.+2.*np.sin(np.linspace(0,2.*np.pi,50))) # just a random line
plt.show()
If I understood correctly seeing the figure you posted you want a second y-axis. You can do this using plt.twinx. An example could be like the following:
import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.twinx()
plt.plot([5,4,5])
plt.show()