How to pass different scatter kwargs to facets in lmplot in Seaborn - matplotlib

I'm trying to map a 3rd variable to the scatter point colour in the Seaborn lmplot. So total_bill on x, tip on y and point colour as function of size.
It works when no faceting is enabled but fails when col is used because the colour array size does not match the size of the data plotted in each facet.
This is my code
import matplotlib as mpl
import seaborn as sns
sns.set(color_codes=True)
# load data
data = sns.load_dataset("tips")
# size of data
print len(data.index)
### we want to plot scatter point colour as function of variable 'size'
# first, sort the data by 'size' so that high 'size' values are plotted
# over the smaller sizes (so they are more visible)
data = data.sort_values(by=['size'], ascending=True)
scatter_kws = dict()
cmap = mpl.cm.get_cmap(name='Blues')
# normalise 'size' variable as float range needs to be
# between 0 and 1 to map to a valid colour
scatter_kws['c'] = data['size'] / data['size'].max()
# map normalised values to colours
scatter_kws['c'] = cmap(scatter_kws['c'].values)
# colour array has same size as data
print len(scatter_kws['c'])
# this works as intended
g = sns.lmplot(data=data, x="total_bill", y="tip", scatter_kws=scatter_kws)
The above works well and produces the following (not allowed to include images yet, so here's the link):
lmplot with point colour as function of size
However, when I add col='sex' to lmplot (try code below), the issue is that the colour array has the size of the original dataset which is larger than the size of data plotted in each facet. So, for example col='male' has 157 data points so first 157 values from the colour array are mapped to the points (and these aren't even the correct ones). See below:
lmplot with point colour as function of size with col=sex
g = sns.lmplot(data=data, x="total_bill", y="tip", col="sex", scatter_kws=scatter_kws)
Ideally, I'd like to pass an array of scatter_kws to the lmplot so that each facet uses the correct colour array (which I'd calculate before passing to lmplot). But that doesn't seem to be an option.
Any other ideas or workarounds that still allow me to use the functionality of Seaborn's lmplot (meaning, without resorting to re-creating lmplot functionality from FacetGrid?

In principle the lmplot with different cols seems to be just a wrapper for several regplots. So instead of one lmplot we could use two regplots, one for each sex.
We therefore need to separate the original dataframe into male and female, the rest is rather straight forward.
import matplotlib.pyplot as plt
import seaborn as sns
data = sns.load_dataset("tips")
data = data.sort_values(by=['size'], ascending=True)
# make a new dataframe for males and females
male = data[data["sex"] == "Male"]
female = data[data["sex"] == "Female"]
# get normalized colors for all data
colors = data['size'].values / float(data['size'].max())
# get colors for males / females
colors_male = colors[data["sex"].values == "Male"]
colors_female = colors[data["sex"].values == "Female"]
# colors are values in [0,1] range
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(9,4))
#create regplot for males, put it to left axes
#use colors_male to color the points with Blues cmap
sns.regplot(data=male, x="total_bill", y="tip", ax=ax1,
scatter_kws= {"c" : colors_male, "cmap":"Blues"})
# same for females
sns.regplot(data=female, x="total_bill", y="tip", ax=ax2,
scatter_kws={"c" : colors_female, "cmap":"Greens"})
ax1.set_title("Males")
ax2.set_title("Females")
for ax in [ax1, ax2]:
ax.set_xlim([0,60])
ax.set_ylim([0,12])
plt.tight_layout()
plt.show()

Related

plotting graph of 3 parameters (PosX ,PosY) vs Time .It is a timeseries data

I am new to this module. I have time series data for movement of particle against time. The movement has its X and Y component against the the time T. I want to plot these 3 parameters in the graph. The sample data looks like this. The first coloumn represent time, 2nd- Xcordinate , 3rd Y-cordinate.
1.5193 618.3349 487.5595
1.5193 619.3349 487.5595
2.5193 619.8688 489.5869
2.5193 620.8688 489.5869
3.5193 622.9027 493.3156
3.5193 623.9027 493.3156
If you want to add a 3rd info to a 2D curve, one possibility is to use a color mapping instituting a relationship between the value of the 3rd coordinate and a set of colors.
In Matplotlib we have not a direct way of plotting a curve with changing color, but we can fake one using matplotlib.collections.LineCollection.
In the following I've used some arbitrary curve but I have no doubt that you could adjust my code to your particular use case if my code suits your needs.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
# e.g., a Lissajous curve
t = np.linspace(0, 2*np.pi, 6280)
x, y = np.sin(4*t), np.sin(5*t)
# to use LineCollection we need an array of segments
# the canonical answer (to upvote...) is https://stackoverflow.com/a/58880037/2749397
points = np.array([x, y]).T.reshape(-1,1,2)
segments = np.concatenate([points[:-1],points[1:]], axis=1)
# instantiate the line collection with appropriate parameters,
# the associated array controls the color mapping, we set it to time
lc = LineCollection(segments, cmap='nipy_spectral', linewidth=6, alpha=0.85)
lc.set_array(t)
# usual stuff, just note ax.autoscale, not needed here because we
# replot the same data but tipically needed with ax.add_collection
fig, ax = plt.subplots()
plt.xlabel('x/mm') ; plt.ylabel('y/mm')
ax.add_collection(lc)
ax.autoscale()
cb = plt.colorbar(lc)
cb.set_label('t/s')
# we plot a thin line over the colormapped line collection, especially
# useful when our colormap contains white...
plt.plot(x, y, color='black', linewidth=0.5, zorder=3)
plt.show()

Geopandas reduce legend size (and remove white space below map)

I would like to know how to change the legend automatically generated by Geopandas. Mostly I would like to reduce its size because it's quite big on the generated image. The legend seems to take all the available space.
Additional question, do you know how to remove the empty space below my map ? I've tried with
pad_inches = 0, bbox_inches='tight'
but I still have an empty space below the map.
Thanks for your help.
This works for me:
some_geodataframe.plot(..., legend=True, legend_kwds={'shrink': 0.3})
Other options here: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.colorbar.html
To show how to get proper size of a colorbar legend accompanying a map created by geopandas' plot() method I use the built-in 'naturalearth_lowres' dataset.
The working code is as follows.
import matplotlib.pyplot as plt
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[(world.name != "Antarctica") & (world.name != "Fr. S. Antarctic Lands")] # exclude 2 no-man lands
plot as usual, grab the axes 'ax' returned by the plot
colormap = "copper_r" # add _r to reverse the colormap
ax = world.plot(column='pop_est', cmap=colormap, \
figsize=[12,9], \
vmin=min(world.pop_est), vmax=max(world.pop_est))
map marginal/face deco
ax.set_title('World Population')
ax.grid()
colorbar will be created by ...
fig = ax.get_figure()
# add colorbar axes to the figure
# here, need trial-and-error to get [l,b,w,h] right
# l:left, b:bottom, w:width, h:height; in normalized unit (0-1)
cbax = fig.add_axes([0.95, 0.3, 0.03, 0.39])
cbax.set_title('Population')
sm = plt.cm.ScalarMappable(cmap=colormap, \
norm=plt.Normalize(vmin=min(world.pop_est), vmax=max(world.pop_est)))
at this stage, 'cbax' is just a blank axes, with un needed labels on x and y axes blank-out the array of the scalar mappable 'sm'
sm._A = []
draw colorbar into 'cbax'
fig.colorbar(sm, cax=cbax, format="%d")
# dont use: plt.tight_layout()
plt.show()
Read the comments in the code for useful info.
The resulting plot:

sns.clustermap ticks are missing

I'm trying to visualize what filters are learning in CNN text classification model. To do this, I extracted feature maps of text samples right after the convolutional layer, and for size 3 filter, I got an (filter_num)*(length_of_sentences) sized tensor.
df = pd.DataFrame(-np.random.randn(50,50), index = range(50), columns= range(50))
g= sns.clustermap(df,row_cluster=True,col_cluster=False)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
This code results in :
Where I can't see all the ticks in the y-axis. This is necessary
because I need to see which filters learn which information. Is there
any way to properly exhibit all the ticks in the y-axis?
kwargs from sns.clustermap get passed on to sns.heatmap, which has an option yticklabels, whose documentation states (emphasis mine):
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Here, the easiest option is to set it to an integer, so it will plot every n labels. We want every label, so we want to set it to 1, i.e.:
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
In your complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(-np.random.randn(50,50), index=range(50), columns=range(50))
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()

Is there a convenient way to add a scale indicator to a plot in matplotlib?

I want to add a scale indicator to a plot like the one labelled '10kpc' in the (otherwise) empty plot below. So basically, the axis use one unit of measure and I want to indicate a length in the plot in a different unit. It has to have the same style as below, i.e. a |----| bar with text above.
Is there a convenient way in matplotlib to do that or do I have to draw three lines (two small vertical, one horizontal) and add the text? An ideal solution would not even require me to set coordinates in the data dimensions, i.e. I just say something along the line of horizontalalignment='left', verticalalignment='bottom', transform=ax.transAxes and specify only the width in data coordinates.
I fought with annotate() and arrow() and their documentations for quiet a bit until I concluded, they were not exactly useful, but I might be wrong.
Edit:
The code below is the closest, I have come so far. I still don't like having to specify the x-coordinates in the data coordinate system. The only thing I want to specify in data is the width of the bar. The rest should be placed in the plot system and ideally the bar should be placed relative to the text (a few pixels above).
import matplotlib.pyplot as plt
import matplotlib.transforms as tfrms
plt.imshow(somedata)
plt.colorbar()
ax = plt.gca()
trans = tfrms.blended_transform_factory( ax.transData, ax.transAxes )
plt.errorbar( 5, 0.06, xerr=10*arcsecperkpc/2, color='k', capsize=5, transform=trans )
plt.text( 5, 0.05, '10kpc', horizontalalignment='center', verticalalignment='top', transform=trans )
Here is a code that adds a horizontal scale bar (or scale indicator or scalebar) to a plot. The bar's width is given in data units, while the height of the edges is in fraction of axes units.
The solution is based on an AnchoredOffsetbox, which contains a VPacker. The VPacker has a label in its lower row, and an AuxTransformBox in its upper row.
The key here is that the AnchoredOffsetbox is positioned relative to the axes, using the loc argument similar to the legend positioning (e.g. loc=4 denotes the lower right corner). However, the AuxTransformBox contains a set of elements, which are positioned inside the box using a transformation. As transformation we can choose a blended transform which transforms x coordinates according to the data transform of the axes and y coordinates according to the axes transform. A tranformation which does this is actually the xaxis_transform of the axes itself. Supplying this transform to the AuxTransformBox allows us to specify the artists within (which are Line2Ds in this case) in a useful way, e.g. the line of the bar will be Line2D([0,size],[0,0]).
All of this can be packed into a class, subclassing the AnchoredOffsetbox, such that it is easy to be used in an existing code.
import matplotlib.pyplot as plt
import matplotlib.offsetbox
from matplotlib.lines import Line2D
import numpy as np; np.random.seed(42)
x = np.linspace(-6,6, num=100)
y = np.linspace(-10,10, num=100)
X,Y = np.meshgrid(x,y)
Z = np.sin(X)/X+np.sin(Y)/Y
fig, ax = plt.subplots()
ax.contourf(X,Y,Z, alpha=.1)
ax.contour(X,Y,Z, alpha=.4)
class AnchoredHScaleBar(matplotlib.offsetbox.AnchoredOffsetbox):
""" size: length of bar in data units
extent : height of bar ends in axes units """
def __init__(self, size=1, extent = 0.03, label="", loc=2, ax=None,
pad=0.4, borderpad=0.5, ppad = 0, sep=2, prop=None,
frameon=True, linekw={}, **kwargs):
if not ax:
ax = plt.gca()
trans = ax.get_xaxis_transform()
size_bar = matplotlib.offsetbox.AuxTransformBox(trans)
line = Line2D([0,size],[0,0], **linekw)
vline1 = Line2D([0,0],[-extent/2.,extent/2.], **linekw)
vline2 = Line2D([size,size],[-extent/2.,extent/2.], **linekw)
size_bar.add_artist(line)
size_bar.add_artist(vline1)
size_bar.add_artist(vline2)
txt = matplotlib.offsetbox.TextArea(label, minimumdescent=False)
self.vpac = matplotlib.offsetbox.VPacker(children=[size_bar,txt],
align="center", pad=ppad, sep=sep)
matplotlib.offsetbox.AnchoredOffsetbox.__init__(self, loc, pad=pad,
borderpad=borderpad, child=self.vpac, prop=prop, frameon=frameon,
**kwargs)
ob = AnchoredHScaleBar(size=3, label="3 units", loc=4, frameon=True,
pad=0.6,sep=4, linekw=dict(color="crimson"),)
ax.add_artist(ob)
plt.show()
In order to achieve a result as desired in the question, you can set the frame off and adjust the linewidth. Of course the transformation from the units you want to show (kpc) into data units (km?) needs to be done by yourself.
ikpc = lambda x: x*3.085e16 #x in kpc, return in km
ob = AnchoredHScaleBar(size=ikpc(10), label="10kpc", loc=4, frameon=False,
pad=0.6,sep=4, linekw=dict(color="k", linewidth=0.8))

Matplotlib plotting a single line that continuously changes color

I would like to plot a curve in the (x,y) plane, where the color of the curve depends on a value of another variable T. x is a 1D numpy array, y is a 1D numpy array.
T=np.linspace(0,1,np.size(x))**2
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x,y)
I want the line to change from blue to red (using RdBu colormap) depending on the value of T (one value of T exists for every (x,y) pair).
I found this, but I don't know how to warp it to my simple example. How would I use the linecollection for my example? http://matplotlib.org/examples/pylab_examples/multicolored_line.html
Thanks.
One idea could be to set the color using color=(R,G,B) then split your plot into n segments and continuously vary either one of the R, G or B (or a combinations)
import pylab as plt
import numpy as np
# Make some data
n=1000
x=np.linspace(0,100,n)
y=np.sin(x)
# Your coloring array
T=np.linspace(0,1,np.size(x))**2
fig = plt.figure()
ax = fig.add_subplot(111)
# Segment plot and color depending on T
s = 10 # Segment length
for i in range(0,n-s,s):
ax.plot(x[i:i+s+1],y[i:i+s+1],color=(0.0,0.5,T[i]))
Hope this is helpful