Align multi-line ticks in Seaborn plot - matplotlib

I have the following heatmap:
I've broken up the category names by each capital letter and then capitalised them. This achieves a centering effect across the labels on my x-axis by default which I'd like to replicate across my y-axis.
yticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.index]
xticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.columns]
fig, ax = plt.subplots(figsize=(20,15))
sns.heatmap(corr, ax=ax, annot=True, fmt="d",
cmap="Blues", annot_kws=annot_kws,
mask=mask, vmin=0, vmax=5000,
cbar_kws={"shrink": .8}, square=True,
linewidths=5)
for p in ax.texts:
myTrans = p.get_transform()
offset = mpl.transforms.ScaledTranslation(-12, 5, mpl.transforms.IdentityTransform())
p.set_transform(myTrans + offset)
plt.yticks(plt.yticks()[0], labels=yticks, rotation=0, linespacing=0.4)
plt.xticks(plt.xticks()[0], labels=xticks, rotation=0, linespacing=0.4)
where corr represents a pre-defined pandas dataframe.
I couldn't seem to find an align parameter for setting the ticks and was wondering if and how this centering could be achieved in seaborn/matplotlib?

I've adapted the seaborn correlation plot example below.
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white")
# Generate a large random dataset
rs = np.random.RandomState(33)
d = pd.DataFrame(data=rs.normal(size=(100, 7)),
columns=['Donald\nDuck','Mickey\nMouse','Han\nSolo',
'Luke\nSkywalker','Yoda','Santa\nClause','Ronald\nMcDonald'])
# Compute the correlation matrix
corr = d.corr()
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
for i in ax.get_yticklabels():
i.set_ha('right')
i.set_rotation(0)
for i in ax.get_xticklabels():
i.set_ha('center')
Note the two for sequences above. These get the label and then set the horizontal alignment (You can also change the vertical alignment (set_va()).
The code above produces this:

Related

Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Am trying to find hist()'s figsize and layout parameter for sns.pairplot().
I have a pairplot that gives me nice scatterplots between the X's and y. However, it is oriented horizontally and there is no equivalent layout parameter to make them vertical to my knowledge. 4 plots per row would be great.
This is my current sns.pairplot():
sns.pairplot(X_train,
x_vars = X_train.select_dtypes(exclude=['object']).columns,
y_vars = ["SalePrice"])
This is what I would like it to look like: Source
num_mask = train_df.dtypes != object
num_cols = train_df.loc[:, num_mask[num_mask == True].keys()]
num_cols.hist(figsize = (30,15), layout = (4,10))
plt.show()
What you want to achieve isn't currently supported by sns.pairplot, but you can use one of the other figure-level functions (sns.displot, sns.catplot, ...). sns.lmplot creates a grid of scatter plots. For this to work, the dataframe needs to be in "long form".
Here is a simple example. sns.lmplot has parameters to leave out the regression line (fit_reg=False), to set the height of the individual subplots (height=...), to set its aspect ratio (aspect=..., where the subplot width will be height times aspect ratio), and many more. If all y ranges are similar, you can use the default sharey=True.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some test data with different y-ranges
np.random.seed(20230209)
X_train = pd.DataFrame({"".join(np.random.choice([*'uvwxyz'], np.random.randint(3, 8))):
np.random.randn(100).cumsum() + np.random.randint(100, 1000) for _ in range(10)})
X_train['SalePrice'] = np.random.randint(10000, 100000, 100)
# convert the dataframe to long form
# 'SalePrice' will get excluded automatically via `melt`
compare_columns = X_train.select_dtypes(exclude=['object']).columns
long_df = X_train.melt(id_vars='SalePrice', value_vars=compare_columns)
# create a grid of scatter plots
g = sns.lmplot(data=long_df, x='SalePrice', y='value', col='variable', col_wrap=4, sharey=False)
g.set(ylabel='')
plt.show()
Here is another example, with histograms of the mpg dataset:
import matplotlib.pyplot as plt
import seaborn as sns
mpg = sns.load_dataset('mpg')
compare_columns = mpg.select_dtypes(exclude=['object']).columns
mpg_long = mpg.melt(value_vars=compare_columns)
g = sns.displot(data=mpg_long, kde=True, x='value', common_bins=False, col='variable', col_wrap=4, color='crimson',
facet_kws={'sharex': False, 'sharey': False})
g.set(xlabel='')
plt.show()

same colorbar/colormap for all subplots [duplicate]

I want to make 4 imshow subplots but all of them share the same colormap. Matplotlib automatically adjusts the scale on the colormap depending on the entries of the matrices. For example, if one of my matrices has all entires as 10 and the other one has all entries equal to 5 and I use the Greys colormap then one of my subplots should be completely black and the other one should be completely grey. But both of them end up becoming completely black. How to make all the subplots share the same scale on the colormap?
To get this right you need to have all the images with the same intensity scale, otherwise the colorbar() colours are meaningless. To do that, use the vmin and vmax arguments of imshow(), and make sure they are the same for all your images.
E.g., if the range of values you want to show goes from 0 to 10, you can use the following:
import pylab as plt
import numpy as np
my_image1 = np.linspace(0, 10, 10000).reshape(100,100)
my_image2 = np.sqrt(my_image1.T) + 3
plt.subplot(1, 2, 1)
plt.imshow(my_image1, vmin=0, vmax=10, cmap='jet', aspect='auto')
plt.subplot(1, 2, 2)
plt.imshow(my_image2, vmin=0, vmax=10, cmap='jet', aspect='auto')
plt.colorbar()
When the ranges of data (data1 and data2) sets are unknown and you want to use the same colour bar for both/all plots, find the overall minimum and maximum to use as vmin and vmax in the call to imshow:
import numpy as np
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=1, ncols=2)
# generate randomly populated arrays
data1 = np.random.rand(10,10)*10
data2 = np.random.rand(10,10)*10 -7.5
# find minimum of minima & maximum of maxima
minmin = np.min([np.min(data1), np.min(data2)])
maxmax = np.max([np.max(data1), np.max(data2)])
im1 = axes[0].imshow(data1, vmin=minmin, vmax=maxmax,
extent=(-5,5,-5,5), aspect='auto', cmap='viridis')
im2 = axes[1].imshow(data2, vmin=minmin, vmax=maxmax,
extent=(-5,5,-5,5), aspect='auto', cmap='viridis')
# add space for colour bar
fig.subplots_adjust(right=0.85)
cbar_ax = fig.add_axes([0.88, 0.15, 0.04, 0.7])
fig.colorbar(im2, cax=cbar_ax)
It may be that you don't know beforehand the ranges of your data, but you may know that somehow they are compatible. In that case, you may prefer to let matplotlib choose those ranges for the first plot and use the same range for the remaining plots. Here is how you can do it. The key is to get the limits with properties()['clim']
import numpy as np
import matplotlib.pyplot as plt
my_image1 = np.linspace(0, 10, 10000).reshape(100,100)
my_image2 = np.sqrt(my_image1.T) + 3
fig, axes = plt.subplots(nrows=1, ncols=2)
im = axes[0].imshow(my_image1)
clim=im.properties()['clim']
axes[1].imshow(my_image2, clim=clim)
fig.colorbar(im, ax=axes.ravel().tolist(), shrink=0.5)
plt.show()

how to set the distance between bars and axis using matplot lib [duplicate]

So currently learning how to import data and work with it in matplotlib and I am having trouble even tho I have the exact code from the book.
This is what the plot looks like, but my question is how can I get it where there is no white space between the start and the end of the x-axis.
Here is the code:
import csv
from matplotlib import pyplot as plt
from datetime import datetime
# Get dates and high temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
#for index, column_header in enumerate(header_row):
#print(index, column_header)
dates, highs = [], []
for row in reader:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
# Plot data.
fig = plt.figure(dpi=128, figsize=(10,6))
plt.plot(dates, highs, c='red')
# Format plot.
plt.title("Daily high temperatures, July 2014", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()
There is an automatic margin set at the edges, which ensures the data to be nicely fitting within the axis spines. In this case such a margin is probably desired on the y axis. By default it is set to 0.05 in units of axis span.
To set the margin to 0 on the x axis, use
plt.margins(x=0)
or
ax.margins(x=0)
depending on the context. Also see the documentation.
In case you want to get rid of the margin in the whole script, you can use
plt.rcParams['axes.xmargin'] = 0
at the beginning of your script (same for y of course). If you want to get rid of the margin entirely and forever, you might want to change the according line in the matplotlib rc file:
axes.xmargin : 0
axes.ymargin : 0
Example
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
tips.plot(ax=ax1, title='Default Margin')
tips.plot(ax=ax2, title='Margins: x=0')
ax2.margins(x=0)
Alternatively, use plt.xlim(..) or ax.set_xlim(..) to manually set the limits of the axes such that there is no white space left.
If you only want to remove the margin on one side but not the other, e.g. remove the margin from the right but not from the left, you can use set_xlim() on a matplotlib axes object.
import seaborn as sns
import matplotlib.pyplot as plt
import math
max_x_value = 100
x_values = [i for i in range (1, max_x_value + 1)]
y_values = [math.log(i) for i in x_values]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
sn.lineplot(ax=ax1, x=x_values, y=y_values)
sn.lineplot(ax=ax2, x=x_values, y=y_values)
ax2.set_xlim(-5, max_x_value) # tune the -5 to your needs

How can I set boxplot color by rainbow in matplotlib

I want to create boxplot of data in comparing, my plot looks like
how can I add color like
You can color the box following this example. Beyond that, you will need to map your data in mind to color on the "rainbow" colormap with this module. Here is an example with random test data. I map colors with means in this example.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
# Random test data
test_data = [np.random.normal(mean, 1, 100) for mean in range(50)]
fig, axes = plt.subplots(figsize=(12, 16))
# Horizontal box plot
bplot = axes.boxplot(test_data,
vert=False, # vertical box aligmnent
patch_artist=True) # fill with color
# Fill with colors
cmap = cm.ScalarMappable(cmap='rainbow')
test_mean = [np.mean(x) for x in test_data]
for patch, color in zip(bplot['boxes'], cmap.to_rgba(test_mean)):
patch.set_facecolor(color)
plt.show()
You can use the cmap property to actually be a function, accepting values between 0 and 1, and call it "normalising" your data. Using matplotlib example on boxplots:
import matplotlib.pyplot as plt
import numpy as np
# Random test data
np.random.seed(123)
all_data = [np.random.normal(0, 5, 100) for std in range(1, 21)]
fig, ax = plt.subplots(nrows=1, figsize=(9, 4))
# rectangular box plot
bplot = ax.boxplot(all_data, 0, '', 0, patch_artist=True)
cm = plt.cm.get_cmap('rainbow')
colors = [cm(val/len(all_data)) for val in range(len(all_data))]
for patch, color in zip(bplot['boxes'], colors):
patch.set_facecolor(color)
plt.show()

"panel barchart" in matplotlib

I would like to produce a figure like this one using matplotlib:
(source: peltiertech.com)
My data are in a pandas DataFrame, and I've gotten as far as a regular stacked barchart, but I can't figure out how to do the part where each category is given its own y-axis baseline.
Ideally I would like the vertical scale to be exactly the same for all the subplots and move the panel labels off to the side so there can be no gaps between the rows.
I haven't exactly replicated what you want but this should get you pretty close.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
#create dummy data
cols = ['col'+str(i) for i in range(10)]
ind = ['ind'+str(i) for i in range(10)]
df = pd.DataFrame(np.random.normal(loc=10, scale=5, size=(10, 10)), index=ind, columns=cols)
#create plot
sns.set_style("whitegrid")
axs = df.plot(kind='bar', subplots=True, sharey=True,
figsize=(6, 5), legend=False, yticks=[],
grid=False, ylim=(0, 14), edgecolor='none',
fontsize=14, color=[sns.xkcd_rgb["brownish red"]])
plt.text(-1, 100, "The y-axis label", fontsize=14, rotation=90) # add a y-label with custom positioning
sns.despine(left=True) # get rid of the axes
for ax in axs: # set the names beside the axes
ax.lines[0].set_visible(False) # remove ugly dashed line
ax.set_title('')
sername = ax.get_legend_handles_labels()[1][0]
ax.text(9.8, 5, sername, fontsize=14)
plt.suptitle("My panel chart", fontsize=18)