How do I use colourmaps with variable alpha in a Seaborn kdeplot without seeing the contour lines? - matplotlib

Python version: 3.6.4 (Anaconda on Windows)
Seaborn: 0.8.1
Matplotlib: 2.1.2
I'm trying to create a 2D Kernel Density plot using Seaborn but I want each step in the colourmap to have a different alpha value. I had a look at this question to create a matplotlib colourmap with alpha values: Add alpha to an existing matplotlib colormap.
I have a problem in that the lines between contours are visible. The result I get is here:
I thought that I had found the answer when I found this question: Hide contour linestroke on pyplot.contourf to get only fills. I tried the method outlined in the answer (using set_edgecolor("face") but it did not work in this case. That question also seemed to be related to vector graphics formats and I am just writing out a PNG.
Here is my script:
import numpy as np
import seaborn as sns
import matplotlib.colors as cols
import matplotlib.pyplot as plt
def alpha_cmap(cmap):
my_cmap = cmap(np.arange(cmap.N))
# Set a square root alpha.
x = np.linspace(0, 1, cmap.N)
my_cmap[:,-1] = x ** (0.5)
my_cmap = cols.ListedColormap(my_cmap)
return my_cmap
xs = np.random.uniform(size=100)
ys = np.random.uniform(size=100)
kplot = sns.kdeplot(data=xs, data2=ys,
cmap=alpha_cmap(plt.cm.viridis),
shade=True,
shade_lowest=False,
n_levels=30)
plt.savefig("example_plot.png")
Guided by some comments on this question I have tried some other methods that have been successful when this problem has come up. Based on this question (Matplotlib Contourf Plots Unwanted Outlines when Alpha < 1) I have tried altering the plot call to:
sns.kdeplot(data=xs, data2=ys,
cmap=alpha_cmap(plt.cm.viridis),
shade=True,
shade_lowest=False,
n_levels=30,
antialiased=True)
With antialiased=True the lines between contours are replaced by a narrow white line:
I have also tried an approach similar to this question - Pyplot pcolormesh confused when alpha not 1. This approach is based on looping over the PathCollections in kplot.collections and tuning the parameters of the edges so that they become invisible. I have tried adding this code and tweaking the linewidth -
for thing in kplot.collections:
thing.set_edgecolor("face")
thing.set_linewidth(0.01)
fig.canvas.draw()
This results in a mix of white and dark lines - .
I believe that I will not be able to tune the line width to make the lines disappear because of the variable width of the contour bands.
Using both methods (antialiasing + linewidth) makes this version, which looks cool but isn't quite what I want:
I also found this question - Changing Transparency of/Remove Contour Lines in Matplotlib
This one suggests overplotting a second plot with a different number of contour levels on the same axis, like:
kplot = sns.kdeplot(data=xs, data2=ys,
ax=ax,
cmap=alpha_cmap(plt.cm.viridis),
shade=True,
shade_lowest=False,
n_levels=30,
antialiased=True)
kplot = sns.kdeplot(data=xs, data2=ys,
ax=ax,
cmap=alpha_cmap(plt.cm.viridis),
shade=True,
shade_lowest=False,
n_levels=35,
antialiased=True)
This results in:
This is better, and almost works. The problem here is I need variable (and non-linear) alpha throughout the colourmap. The variable banding and lines seem to be a result of the combinations of alpha when contours are plotted over each other. I also still see some clear/white lines in the result.

Related

How to fix lines of axes overlapping imshow plot?

When plotting matrices using matplotlib's imshow function the lines of the axes can overlap the actual plot, see the following minimal example (matshow is just a simple wrapper around imshow):
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(3,3))
ax.matshow(np.random.random((50, 50)), interpolation="none", cmap="Blues")
plt.savefig("example.png", dpi=300)
I would expect every entry of the matrix to be represented by a square, but in the top row it is quite obvious that the axis is hiding a bit of the plot resulting in non-square entries. The same is happening for the last column. Since I want the complete matrix to be seen - every entry with the same importance - is there any way this can be fixed?
To me, this is just a visualisation issue. If I run your code and maximise the window, I do not see the overlapping you are talking about:
Otherwise, remove the spines but without hiding the ticks:
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
EDIT
Reduce the thickness of the borders:
[x.set_linewidth(0.3) for x in ax.spines.values()]
The following is the exported image:
With 0.2 the exported image looks like this:

Adding a Rectangle Patch and Text Patch to 3D Collection in Matplotlib

Problem Statement
I'm attempting to add two patches -- a rectangle patch and a text patch -- to the same space within a 3D plot. The ultimate goal is to annotate the rectangle patch with a corresponding value (about 20 rectangles across 4 planes -- see Figure 3). The following code does not get all the way there, but does demonstrate a rendering issue where sometimes the text patch is completely visible and sometimes it isn't -- interestingly, if the string doesn't extend outside the rectangle patch, it never seems to become visible at all. The only difference between Figures 1 and 2 is the rotation of the plot viewer image. I've left the cmap code in the example below because it's a requirement of the project (and just in case it affects the outcome).
Things I've Tried
Reversing the order that the patches are drawn.
Applying zorder values -- I think art3d.pathpatch_2d_to_3d is overriding that.
Creating a patch collection -- I can't seem to find a way to add the rectangle patch and the text patch to the same 3D collection.
Conclusion
I suspect that setting zorder to each patch before adding them to a 3D collection may be the solution, but I can't seem to find a way to get to that outcome. Similar questions suggest this, but I haven't been able to apply their answers to this problem specifically.
Environment
macOS: Big Sur 11.2.3
Python 3.8
Matplotlib 3.3.4
Figure 1
Figure 2
Figure 3
The Code
Generates Figures 1 and 2 (not 3).
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
from matplotlib.patches import Rectangle, PathPatch
from matplotlib.text import TextPath
from matplotlib.transforms import Affine2D
import mpl_toolkits.mplot3d.art3d as art3d
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
plt.style.use('dark_background')
fig = plt.figure()
ax = fig.gca(projection='3d')
cmap = plt.cm.bwr
norm = Normalize(vmin=50, vmax=80)
base_color = cmap(norm(50))
# Draw box
box = Rectangle((25, 25), width=50, height=50, color=cmap(norm(62)), ec='black', alpha=1)
ax.add_patch(box)
art3d.pathpatch_2d_to_3d(box, z=1, zdir="z")
# Draw text
text_path = TextPath((60, 50), "xxxx", size=10)
trans = Affine2D().rotate(0).translate(0, 1)
p1 = PathPatch(trans.transform_path(text_path))
ax.add_patch(p1)
art3d.pathpatch_2d_to_3d(p1, z=1, zdir="z")
ax.set_xlabel('x')
ax.set_xlim(0, 100)
ax.set_xticklabels([])
ax.xaxis.set_pane_color(base_color)
ax.set_ylabel('y')
ax.set_ylim(0, 100)
ax.set_yticklabels([])
ax.yaxis.set_pane_color(base_color)
ax.set_zlabel('z')
ax.set_zlim(1, 4)
ax.set_zticks([1, 2, 3, 4])
ax.zaxis.set_pane_color(base_color)
ax.set_zticklabels([])
plt.show()
This is a well-known problem with matplotlib 3D plotting: objects are drawn in a particular order, and those plotted last appear on "top" of the others, regardless of which should be in front in a "true" 3D plot.
See the FAQ here: https://matplotlib.org/mpl_toolkits/mplot3d/faq.html#my-3d-plot-doesn-t-look-right-at-certain-viewing-angles
My 3D plot doesn’t look right at certain viewing angles
This is probably the most commonly reported issue with mplot3d. The problem is that – from some viewing angles – a 3D object would appear in front of another object, even though it is physically behind it. This can result in plots that do not look “physically correct.”
Unfortunately, while some work is being done to reduce the occurrence of this artifact, it is currently an intractable problem, and can not be fully solved until matplotlib supports 3D graphics rendering at its core.
The problem occurs due to the reduction of 3D data down to 2D + z-order scalar. A single value represents the 3rd dimension for all parts of 3D objects in a collection. Therefore, when the bounding boxes of two collections intersect, it becomes possible for this artifact to occur. Furthermore, the intersection of two 3D objects (such as polygons or patches) can not be rendered properly in matplotlib’s 2D rendering engine.
This problem will likely not be solved until OpenGL support is added to all of the backends (patches are greatly welcomed). Until then, if you need complex 3D scenes, we recommend using MayaVi.

Different level of transparency for edgeline and fill in matplotlib or seaborn distribution plot

I would like to set different levels of transparency (= alpha) for the edge line and fill of a distribution plot that I created in matplotlib/seaborn. For example:
ax1 = sns.distplot(BSRDI_DF, label="BsrDI", bins=newBins, kde=False,
hist_kws={"edgecolor": (1,0,0,1), "color":(1,0,0,0.25)})
The above approach does not work, unfortunately. Does anybody have any idea how I could accomplish this?
The problem seems to be that seaborn sets an alpha parameter for the histogram. While alpha defaults to None for a usual histogram, such that something like
plt.hist(x, lw=3, edgecolor=(1,0,0,0.75), color=(1,0,0,0.25))
works as expected, seaborn sets this alpha to some given value. This overwrites the alpha that is set in the RGBA tuples.
The solution is to set alpha explicitely to None:
ax = sns.distplot(x, kde=False, hist_kws={"lw":3, "edgecolor": (1,0,0,0.75),
"color":(1,0,0,0.25),"alpha":None})
A complete example:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
x = np.random.randn(60)
ax = sns.distplot(x, label="BsrDI", bins=np.linspace(-3,3,10), kde=False,
hist_kws={"lw":3, "edgecolor": (1,0,0,0.75),
"color":(1,0,0,0.25),"alpha":None})
plt.show()
EDIT Nevermind, I thought using color instead of facecolor was causing the problem but it seems the output that I got only looked right because the patches were overlapping, giving seemingly darker edges.
After investigating the issue further, it looks like seaborn is hard-setting the alpha level at 0.4, which supersedes the arguments passed to hist_kws=
sns.distplot(x, kde=False, hist_kws={"edgecolor": (1,0,0,1), "lw":5, "facecolor":(0,1,0,0.1), "rwidth":0.8})
While using the same parameters to plt.hist() gives:
plt.hist(x, edgecolor=(1,0,0,1), lw=5, facecolor=(0,1,0,0.1), rwidth=0.8)
Conclusion: if you want different alpha levels for edges and face colors, you'll have to use matplotlib directly, and not seaborn.

Figures with lots of data points in matplotlib

I generated the attached image using matplotlib (png format). I would like to use eps or pdf, but I find that with all the data points, the figure is really slow to render on the screen. Other than just plotting less of the data, is there anyway to optimize it so that it loads faster?
I think you have three options:
As you mentioned yourself, you can plot fewer points. For the plot you showed in your question I think it would be fine to only plot every other point.
As #tcaswell stated in his comment, you can use a line instead of points which will be rendered more efficiently.
You could rasterize the blue dots. Matplotlib allows you to selectively rasterize single artists, so if you pass rasterized=True to the plotting command you will get a bitmapped version of the points in the output file. This will be way faster to load at the price of limited zooming due to the resolution of the bitmap. (Note that the axes and all the other elements of the plot will remain as vector graphics and font elements).
First, if you want to show a "trend" in your plot , and considering the x,y arrays you are plotting are "huge" you could apply a random sub-sampling to your x,y arrays, as a fraction of your data:
import numpy as np
import matplotlib.pyplot as plt
fraction = 0.50
x_resampled = []
y_resampled = []
for k in range(0,len(x)):
if np.random.rand() < fraction:
x_resampled.append(x[k])
y_resampled.append(y[k])
plt.scatter(x_resampled,y_resampled , s=6)
plt.show()
Second, have you considered using log-scale in the x-axis to increase visibility?
In this example, only the plotting area is rasterized, the axis are still in vector format:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(size=400000)
y = np.random.uniform(size=400000)
plt.scatter(x, y, marker='x', rasterized=False)
plt.savefig("norm.pdf", format='pdf')

Dotted line style from non-evenly distributed data

I'm new to Python and MatPlotlib.
This is my first posting to Stackoverflow - I've been unable to find the answer elsewhere and would be grateful for your help.
I'm using Windows XP, with Enthought Canopy v1.1.1 (32 bit).
I want to plot a dotted-style linear regression line through a scatter plot of data, where both x and y arrays contain random floating point data.
The dots in the resulting dotted line are not distributed evenly along the regression line, and are "smeared together" in the middle of the red line, making it look messy (see upper plot resulting from attached minimal example code).
This does not seem to occur if the items in the array of x values are evenly distributed (lower plot).
I'm therefore guessing that this is an issue with how MatplotLib renders dotted lines, or with how Canopy interfaces Python with Matplotlib.
Please could you tell me a workaround which will make the dots on the dotted line type appear evenly distributed; even if both x and y data are non-evenly distributed; whilst still using Canopy and Matplotlib?
(As a general point, I'm always keen to improve my coding skills - if any code in my example can be written more neatly or concisely, I'd be grateful for your expertise).
Many thanks in anticipation
Dave
(UK)
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
#generate data
x1=10 * np.random.random_sample((40))
x2=np.linspace(0,10,40)
y=5 * np.random.random_sample((40))
slope, intercept, r_value, p_value, std_err = stats.linregress(x1,y)
line = (slope*x1)+intercept
plt.figure(1)
plt.subplot(211)
plt.scatter(x1,y,color='blue', marker='o')
plt.plot(x1,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
slope, intercept, r_value, p_value, std_err = stats.linregress(x2,y)
line = (slope*x2)+intercept
plt.subplot(212)
plt.scatter(x2,y,color='blue', marker='o')
plt.plot(x2,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
plt.show()
Welcome to SO.
You have already identified the problem yourself, but seem a bit surprised that a random x-array results in the line be 'cluttered'. But you draw a dotted line repeatedly over the same location, so it seems like the normal behavior to me that it gets smeared at places where there are multiple dotted lines on top of each other.
If you don't want that, you can sort your array and use that to calculate the regression line and plot it. Since its a linear regression, just using the min and max values would also work.
x1_sorted = np.sort(x1)
line = (slope * x1_sorted) + intercept
or
x1_extremes = np.array([x1.min(),x1.max()])
line = (slope * x1_extremes) + intercept
The last should be faster if x1 becomes very large.
With regard to your last comment. In your example you use whats called the 'state-machine' environment for plotting. It means that specified commands are applied to the active figure and the active axes (subplots).
You can also consider the OO approach where you get figure and axes objects. This means you can access any figure or axes at any time, not just the active one. Its useful when passing an axes to a function for example.
In your example both would work equally well and it would be more a matter of taste.
A small example:
# create a figure with 2 subplots (2 rows, 1 column)
fig, axs = plt.subplots(2,1)
# plot in the first subplots
axs[0].scatter(x1,y,color='blue', marker='o')
axs[0].plot(x1,line,'r:',label="Regression Line")
# plot in the second
axs[1].plot()
etc...