Ticks position in heatmap with categorical data (seaborn) - matplotlib

I am trying to plot a confusion matrix of my predictions. My data is multi-class (13 different labels) so I'm using a heatmap.
As you can see below, my heat map looks generally okay but the labels are a bit out of position: y ticks should be a little lower and x ticks should be a bit more to the right. I want to move both axis ticks a bit so they will aligned with the center of each square.
my code:
sns.set()
my_mask = np.zeros((con_matrix.shape[0], con_matrix.shape[0]), dtype=int)
for i in range(con_matrix.shape[0]):
for j in range(con_matrix.shape[0]):
my_mask[i][j] = con_matrix[i][j] == 0
fig_dims = (10, 10)
plt.subplots(figsize=fig_dims)
ax = sns.heatmap(con_matrix, annot=True, fmt="d", linewidths=.5, cmap="Pastel1", cbar=False, mask=my_mask, vmax=15)
plt.xticks(range(len(party_names)), party_names, rotation=45)
plt.yticks(range(len(party_names)), party_names, rotation='horizontal')
plt.show()
and for reproduction purpose, here are con_matrix and party_names hard-coded:
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
con_matrix = np.array([[55, 0, 0, 0,0, 0, 0,0,0,0,0,0,2], [0,199,0,0,0,0,0,0,0,0,2,0,1],
[0, 0,52,0,0,0,0,0,0,0,0,0,1],
[0,0,0,39,0,0,0,0,0,0,0,0,0],
[0,0,0,0,90,0,0,0,0,0,0,4,3],
[0,0,0,1,0,35,0,0,0,0,0,0,0],
[0,0,0,0,5,0,26,0,0,1,0,1,0],
[0,5,0,0,0,1,0,44,0,0,3,0,1],
[0,1,0,0,0,0,0,0,52,0,0,0,0],
[0,1,0,0,2,0,0,0,0,235,0,1,1],
[1,2,0,0,0,0,0,3,0,0,34,0,3],
[0,0,0,0,5,0,0,0,0,1,0,40,0],
[0,0,0,0,0,0,0,0,0,1,0,0,46]])
party_names = ['Blues', 'Browns', 'Greens', 'Greys', 'Khakis', 'Oranges', 'Pinks', 'Purples', 'Reds', 'Turquoises', 'Violets', 'Whites', 'Yellows']
I already tried to work with position argument of different axes, but it did not turn out well. Could not find an exactly answer in this site as well (at least not a solution that works for categorical data).
I'm new in visualization with seaborn, any improvement with explanations would be appreciated (not only for my problem but on my code & visualization as well).

You can shift both the ticklabels by 0.5 offset to have the desired alignment. To do so, I have used NumPy's arange that enables vectorized addition of 0.5 to the whole array.
plt.xticks(np.arange(len(party_names))+0.5, party_names, rotation=45)
plt.yticks(np.arange(len(party_names))+0.5, party_names, rotation='horizontal')

Related

Pyplot axis limits within boundaries

Is there an easy way to avoid pyplot zooming far into noisy data?
Something like a lower boundary for the axis limits.
I am not trying to set a fix boundary to my axis, as this will fully disable automatic scaling.
Maybe a "minimum tick distance" would also work.
Right now I am using an additional 'invisible' plot in my graph that will define the maximum zoom.
Some example that illustrates what I want to achieve:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 100, 1)
noise = np.random.randn(len(x))*0.1
y = 10+noise
y_dummy_low = [0]*len(x)
y_dummy_high = [20]*len(x)
plt.figure()
plt.plot(x, y) # noise data i actually want to plot
plt.plot(x, y_dummy_low, y_dummy_high, marker="None", linestyle="None") # this will avoid zooming too much
plt.show()
Zooming too far
Zooming OK

Problem with text and annotation x and y coordinates changing while looping through subplots matplotlib

I would like to iterate through subplots, plot data, and annotate the subplots with either the text function or the annotation function in matplotlib. Both functions ask for x and y coordinates in order to place text or annotations. I can get this to work fine, until I plot data. Then the annotations and the text jump all over the place and I cannot figure out why.
My set up is something like this, which produces well-aligned annotations with no data:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
fig, ax=plt.subplots(nrows=3, ncols=3, sharex=True)
fig.suptitle('Axes ylim unpacking error demonstration')
annotation_colors=["red", "lightblue", "tan", "purple", "lightgreen", "black", "pink", "blue", "magenta"]
for jj, ax in enumerate(ax.flat):
bott, top = plt.ylim()
left, right = plt.xlim()
ax.text(left+0.1*(right-left), bott+0.1*(top-bott), 'Annotation', color=annotation_colors[jj])
plt.show
When I add random data (or my real data), the annotations jump:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#Same as above but but with 9 random data frames plotted.
df_cols = ['y' + str(x) for x in range(1,10)]
df=pd.DataFrame(np.random.randint(0,10, size=(10,9)), columns=df_cols)
df['x']=range(0,10)
#Make a few columns much larger in terms of magnitude of mean values
df['y2']=df['y2']*-555
df['y5']=df['y5']*123
fig, ax=plt.subplots(nrows=3, ncols=3, sharex=True)
fig.suptitle('Axes ylim unpacking error demonstration')
annotation_colors=["red", "lightblue", "tan", "purple", "lightgreen", "black", "pink", "blue", "magenta"]
for jj, ax in enumerate(ax.flat):
ax.plot(df['x'], df['y'+str(jj+1)], color=annotation_colors[jj])
bott, top = plt.ylim()
left, right = plt.xlim()
ax.text(left+0.1*(right-left), bott+0.1*(top-bott), 'Annotation', color=annotation_colors[jj])
plt.show()
This is just to demonstrate the issue that is likely caused by my lack of understanding of how the ax and fig calls are working. It seems to me that the coordinates x and y of the ax.text call may actually apply to the coordinates of of the fig, or something similar. The end result is far worse with my actual data!!! In that case, some of the annotations end up miles above the actual plots and not even within the coordinates of any of the subplot axes. Others completely overlap! What I am misunderstanding?
Edit for more details:
I have tried Stef's solution of using axes coordinates of axes.text(0.1, 0.1, 'Annotation'...)
I get the following plot, which still shows the same problem of moving the text all over the place. Because I am running this example with random numbers, the annotations are moving randomly with every run - i.e. they are not just displaced in the subplots with different axis ranges (y2 and y5).
You can specify the text location in axes coordinates (as opposed to data coordinates as you did implicitely):
ax.text(.1, .1, 'Annotation', color=annotation_colors[jj], transform=ax.transAxes)
See the Transformations Tutorial for further information.

Colorbar frame and color not aligned

I have a vexing issue with a colorbar and even after vigorous research I cannot find the question even being asked. I have a plot where I overlay a contour and a pcolormesh and I would like a colorbar to indicate values. That works fine except for one thing:
The colorbar frame and color are offset
The colorbar frame and the actual bar are offset such that below you have a white bit in the frame and on top the color is poking out. While the frame is aligned with the axis as desired, the colorbar is offset.
Here is a working example that emulates the situation I was in, i.e. multiple plots with insets.
import matplotlib.gridspec as gridspec
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
figheight = 4.2 - (2.1 - 49.519 / 25.4)
matplotlib.rcParams['figure.figsize'] = (5.25, figheight)
matplotlib.rcParams['axes.linewidth'] = 0.5
fig = plt.figure()
grid = gridspec.GridSpec(2, 1, height_ratios=[49.519 / 25.4 / figheight, 2.1 / figheight])
ax0 = plt.subplot(grid[0, 0])
ax1 = plt.subplot(grid[1, 0])
plt.tight_layout()
###############################################################################################
#
# Define position of inset
#
###############################################################################################
ax1.axis('off')
pos1 = ax1.get_position()
pos2 = matplotlib.transforms.Bbox([[pos1.x0, pos1.y0],
[.8*pos1.x1,
0.8*pos1.height + pos1.y0]])
left, bottom, width, height = [pos2.x0, pos2.y0, pos2.width, pos2.height]
ax2 = fig.add_axes([left, bottom, width, height])
###############################################################################################
#
# ax2 (inset) plot
#
###############################################################################################
pos2 = ax2.get_position()
ax2.axis('on')
x = np.linspace(0,5)
z = (np.outer(np.sin(x), np.cos(x))+1)*0.5
im = ax2.pcolormesh(z)
c = ax2.contour(z, linewidths=7)
ax2pos = ax2.get_position()
cbar_axis = fig.add_axes([ax2pos.x1+0.05,ax2pos.y0, .02, ax2pos.height])
colorbar = fig.colorbar(im, ax = ax2,
cax = cbar_axis, ticks = [0.1, .5, .9])
colorbar.outline.set_visible(True)
plot = 'Minimal.pdf'
fig.savefig(plot)
plt.close()
The problem persists in both the inline display and the saved .pdf if 'Inline' graphics backend is chosen. Using tight layout or not changes how badly the offset is depending on the size of the bar - same with using PyQT5 rather than inline graphics backend. I thought it was gone when I was changing between the various combinations, but I just realized it's still there.
I would appreciate any input.
As suggested by ImportanceOfBeingErnest I have tried using np.round on the figsize and that didn't change things. While you can fiddle around with sizes to make it look okay, it always stands over on one or the other side by some amount. When I change the graphics backend on Spyder 3 from 'Inline' to 'QT5' the problem becomes less severe with or without rounding. A summary of this is in this picture Colorbar overlap cases. Note that with not rounded and PyQT5 the problem still occurs, but is not as severe.
On inspection, it is clear that the colorbar is not only bleeding out over the top of its axes, but it's also positioned slightly to the left.
So, the problem here appears to be a conflict between the position of the colorbar axis and the colorbar itself when rasterization occurs. You can find more details on this issue in matplotlib's github repository, but I'll summarize what's going on here.
Colorbars are rasterized when the output is produced, so as to avoid artifacting issues during rendering. The position of the colorbar is snapped to the nearest integer pixels during the rasterization process, while the axis is kept where it is supposed to be. Then, when the output is produced, the colorbar falls within borders of fixed pixels of the image, despite the fact that the image is, itself, vectorized. Thus, there are two strategies that can be employed to avoid this mishap.
Use a finer DPI
The conversion from vectorized coordinates to rasterized coordinates takes place assuming a given DPI on the image. By default, this is set to be 72. However, by using more DPI, the overall shift induced by the rasterization process will be smaller, as the closest pixel the colorbar will snap to will be much nearer. Here, we change the output to have fig.savefig(plot,dpi=4000), and the problem goes away:
Note, however, that on my machine, the output size changed from 62 KB to 78 KB due to this change (although the DPI adjustment was also, admittedly, extreme). If you are worried about file sizes, you should pick a lower DPI that fixes the problem.
Use a different colormap
This rasterization happens when more than 50 colors are in the colorbar. Thus, we can do a quick test, setting our colormap to Pastel1 via
im = ax2.pcolormesh(z,cmap='Pastel1'). Here, the colorbar / axis mismatch is mitigated.
As a fallback, adopting a colorbar with fewer than 50 colors should mitigate this problem.
Rasterize the Axis
For completeness, there is also a third option. If you rasterize the colorbar axis, both the axis boundaries and the colormap will be rasterized, and you'll lose the offset. This will also rasterize your labels, and the axis will shift as one, breaking alignment with the nearby axis. For this, you just need to include cbar_axis.set_rasterized(True).
First, a way to overlay a contour and a pcolormesh and create a colorbar would be the following
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
x = np.linspace(0,5)
z = (np.outer(np.sin(x), np.cos(x))+1)*0.5
fig = plt.figure(figsize=(4, 4))
ax = fig.add_subplot(111)
im = ax.pcolormesh(z)
c = ax.contour(z, linewidths=7)
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", "5%", pad="3%")
colorbar = fig.colorbar(im, cax=cax, ticks = [0.1, .5, .9])
plt.show()
Now to the problem from the question. It is of course possible to create the axes to put the colorbar in manually. Replacing the colorbar creation with the code from the question still produces a nice image.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,5)
z = (np.outer(np.sin(x), np.cos(x))+1)*0.5
fig = plt.figure(figsize=(4, 4))
ax = fig.add_subplot(111)
plt.subplots_adjust(right=0.8)
im = ax.pcolormesh(z)
c = ax.contour(z, linewidths=7)
ax2pos = ax.get_position()
cbar_axis = fig.add_axes([ax2pos.x1+0.05,ax2pos.y0, .05, ax2pos.height])
colorbar = fig.colorbar(im, ax = ax,
cax = cbar_axis, ticks = [0.1, .5, .9])
colorbar.outline.set_visible(True)
plt.show()
Conclusion so far: The issue is not reproducible, at least not without a Minimal, Complete, and Verifiable example.
I'm uncertain about the reasons for the behaviour in the example from the question. However, it seems that it can be overcome by rounding the figure size to 3 significant digits
matplotlib.rcParams['figure.figsize'] = (5.25, np.round(figheight,3))

Correct legend color for intersecting transparent layers in Matplotlib

I often need to indicate the distribution of some data in a concise plot, as in the below figure. I do this by plotting several fill_between areas, limited by the quantiles of the distribution.
ax.fill_between(x, quantile1, quantile2, alpha=0.2)
In a for loop, I make plots like this by calculating quantiles 1 and 2 (as indicated by the legend) as the 0% to 100% quantiles, then 10% to 90% and so on, each fill_between plotting on top of the previous "layer".
Here is the output with three layers of transparent colors along with the median line (0.5):
However, the legend colors are not what I would like them to be, since they (naturally) use the color of each individual layer, not taking into account the combined effect of several layers.
ax.legend([0.5]+[['0.0%', '100.0%'], ['10.0%', '90.0%'], ['30.0%', '70.0%']])
What is the best way to overwrite the face color value within the legend command?
I would like to avoid doing this by first plotting 0% to 10% with transparency "0.2", then 10% to 30% with transparency "0.4" and so on, as this will take twice the amount of time to compute and will make the code more complicated.
You can use proxy artists to place in the legend which have the exact same transparency as the resulting overlay from the plot.
As a proxy artist you can use a simple rectangle. The transparency however needs to be calculated as two objects with transparency 0.2 together will appear as a single object with transparency 0.36 (and not 0.4!).
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.patches
a = np.sort(np.random.rand(6,18), axis=0)
x = np.arange(len(a[0]))
def alpha(i, base=0.2):
l = lambda x: x+base-x*base
ar = [l(0)]
for j in range(i):
ar.append(l(ar[-1]))
return ar[-1]
fig, ax = plt.subplots(figsize=(4,2))
handles = []
labels=[]
for i in range(len(a)/2):
ax.fill_between(x, a[i, :], a[len(a)-1-i, :], color="blue", alpha=0.2)
handle = matplotlib.patches.Rectangle((0,0),1,1,color="blue", alpha=alpha(i, base=0.2))
handles.append(handle)
label = "quant {:.1f} to {:.1f}".format(float(i)/len(a)*100, 100-float(i)/len(a)*100)
labels.append(label)
plt.legend(handles=handles, labels=labels, framealpha=1)
plt.show()
One has to decide if this is really worth the effort. A solution without transparency but with the very same result can be achieved much shorter:
import matplotlib.pyplot as plt
import numpy as np
a = np.sort(np.random.rand(6,18), axis=0)
x = np.arange(len(a[0]))
fig, ax = plt.subplots(figsize=(4,2))
for i in range(len(a)/2):
label = "quant {:.1f} to {:.1f}".format(float(i)/len(a)*100, 100-float(i)/len(a)*100)
c = plt.cm.Blues(0.2+.6*(float(i)/len(a)*2) )
ax.fill_between(x, a[i, :], a[len(a)-1-i, :], color=c, label=label)
plt.legend( framealpha=1)
plt.show()

heatmap for positive and negative values [duplicate]

I am trying to make a filled contour for a dataset. It should be fairly straightforward:
plt.contourf(x, y, z, label = 'blah', cm = matplotlib.cm.RdBu)
However, what do I do if my dataset is not symmetric about 0? Let's say I want to go from blue (negative values) to 0 (white), to red (positive values). If my dataset goes from -8 to 3, then the white part of the color bar, which should be at 0, is in fact slightly negative. Is there some way to shift the color bar?
First off, there's more than one way to do this.
Pass an instance of DivergingNorm as the norm kwarg.
Use the colors kwarg to contourf and manually specify the colors
Use a discrete colormap constructed with matplotlib.colors.from_levels_and_colors.
The simplest way is the first option. It is also the only option that allows you to use a continuous colormap.
The reason to use the first or third options is that they will work for any type of matplotlib plot that uses a colormap (e.g. imshow, scatter, etc).
The third option constructs a discrete colormap and normalization object from specific colors. It's basically identical to the second option, but it will a) work with other types of plots than contour plots, and b) avoids having to manually specify the number of contours.
As an example of the first option (I'll use imshow here because it makes more sense than contourf for random data, but contourf would have identical usage other than the interpolation option.):
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import DivergingNorm
data = np.random.random((10,10))
data = 10 * (data - 0.8)
fig, ax = plt.subplots()
im = ax.imshow(data, norm=DivergingNorm(0), cmap=plt.cm.seismic, interpolation='none')
fig.colorbar(im)
plt.show()
As an example of the third option (notice that this gives a discrete colormap instead of a continuous colormap):
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import from_levels_and_colors
data = np.random.random((10,10))
data = 10 * (data - 0.8)
num_levels = 20
vmin, vmax = data.min(), data.max()
midpoint = 0
levels = np.linspace(vmin, vmax, num_levels)
midp = np.mean(np.c_[levels[:-1], levels[1:]], axis=1)
vals = np.interp(midp, [vmin, midpoint, vmax], [0, 0.5, 1])
colors = plt.cm.seismic(vals)
cmap, norm = from_levels_and_colors(levels, colors)
fig, ax = plt.subplots()
im = ax.imshow(data, cmap=cmap, norm=norm, interpolation='none')
fig.colorbar(im)
plt.show()