Efficiently Plotting Many Lines in VisPy - matplotlib

From all example code/demos I have seen in the VisPy library, I only see one way that people plot many lines, for example:
for i in range(N):
pos = pos.copy()
pos[:, 1] = np.random.normal(scale=5, loc=(i+1)*30, size=N)
line = scene.visuals.Line(pos=pos, color=color, parent=canvas.scene)
lines.append(line)
canvas.show()
My issue is that I have many lines to plot (each several hundred thousand points). Matplotlib proved too slow because of the total number of points plotted was in the millions, hence I switched to VisPy. But VisPy is even slower when you plot thousands of lines each with thousands of points (the speed-up comes when you have millions of points).
The root cause is in the way lines are drawn. When you create a plot widget and then plot a line, each line is rendered to the canvas. In matplotlib you can explicitly state to not show the canvas until all lines are drawn in memory, but there doesn't appear to be the same functionality in VisPy, making it useless.
Is there any way around this? I need to plot multiple lines so that I can change properties interactively, so flattening all the data points into one plot call won't work.
(I am using a PyQt4 to embed the plot in a GUI. I have also considered pyqtgraph.)

You should pass an array to the "connect" parameter of the Line() function.
xy = np.random.rand(5,2) # 2D positions
# Create an array of point connections :
toconnect = np.array([[0,1], [0,2], [1,4], [2,3], [2,4]])
# Point 0 in your xy will be connected with 1 and 2, point
# 1 with 4 and point 2 with 3 and 4.
line = scene.visuals.Line(pos=xy, connect=toconnect)
You only add one object to your canvas but the control pear line is more limited.

Related

Adding a Rectangle Patch and Text Patch to 3D Collection in Matplotlib

Problem Statement
I'm attempting to add two patches -- a rectangle patch and a text patch -- to the same space within a 3D plot. The ultimate goal is to annotate the rectangle patch with a corresponding value (about 20 rectangles across 4 planes -- see Figure 3). The following code does not get all the way there, but does demonstrate a rendering issue where sometimes the text patch is completely visible and sometimes it isn't -- interestingly, if the string doesn't extend outside the rectangle patch, it never seems to become visible at all. The only difference between Figures 1 and 2 is the rotation of the plot viewer image. I've left the cmap code in the example below because it's a requirement of the project (and just in case it affects the outcome).
Things I've Tried
Reversing the order that the patches are drawn.
Applying zorder values -- I think art3d.pathpatch_2d_to_3d is overriding that.
Creating a patch collection -- I can't seem to find a way to add the rectangle patch and the text patch to the same 3D collection.
Conclusion
I suspect that setting zorder to each patch before adding them to a 3D collection may be the solution, but I can't seem to find a way to get to that outcome. Similar questions suggest this, but I haven't been able to apply their answers to this problem specifically.
Environment
macOS: Big Sur 11.2.3
Python 3.8
Matplotlib 3.3.4
Figure 1
Figure 2
Figure 3
The Code
Generates Figures 1 and 2 (not 3).
#! /usr/bin/env python3
# -*- coding: utf-8 -*-
from matplotlib.patches import Rectangle, PathPatch
from matplotlib.text import TextPath
from matplotlib.transforms import Affine2D
import mpl_toolkits.mplot3d.art3d as art3d
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
plt.style.use('dark_background')
fig = plt.figure()
ax = fig.gca(projection='3d')
cmap = plt.cm.bwr
norm = Normalize(vmin=50, vmax=80)
base_color = cmap(norm(50))
# Draw box
box = Rectangle((25, 25), width=50, height=50, color=cmap(norm(62)), ec='black', alpha=1)
ax.add_patch(box)
art3d.pathpatch_2d_to_3d(box, z=1, zdir="z")
# Draw text
text_path = TextPath((60, 50), "xxxx", size=10)
trans = Affine2D().rotate(0).translate(0, 1)
p1 = PathPatch(trans.transform_path(text_path))
ax.add_patch(p1)
art3d.pathpatch_2d_to_3d(p1, z=1, zdir="z")
ax.set_xlabel('x')
ax.set_xlim(0, 100)
ax.set_xticklabels([])
ax.xaxis.set_pane_color(base_color)
ax.set_ylabel('y')
ax.set_ylim(0, 100)
ax.set_yticklabels([])
ax.yaxis.set_pane_color(base_color)
ax.set_zlabel('z')
ax.set_zlim(1, 4)
ax.set_zticks([1, 2, 3, 4])
ax.zaxis.set_pane_color(base_color)
ax.set_zticklabels([])
plt.show()
This is a well-known problem with matplotlib 3D plotting: objects are drawn in a particular order, and those plotted last appear on "top" of the others, regardless of which should be in front in a "true" 3D plot.
See the FAQ here: https://matplotlib.org/mpl_toolkits/mplot3d/faq.html#my-3d-plot-doesn-t-look-right-at-certain-viewing-angles
My 3D plot doesn’t look right at certain viewing angles
This is probably the most commonly reported issue with mplot3d. The problem is that – from some viewing angles – a 3D object would appear in front of another object, even though it is physically behind it. This can result in plots that do not look “physically correct.”
Unfortunately, while some work is being done to reduce the occurrence of this artifact, it is currently an intractable problem, and can not be fully solved until matplotlib supports 3D graphics rendering at its core.
The problem occurs due to the reduction of 3D data down to 2D + z-order scalar. A single value represents the 3rd dimension for all parts of 3D objects in a collection. Therefore, when the bounding boxes of two collections intersect, it becomes possible for this artifact to occur. Furthermore, the intersection of two 3D objects (such as polygons or patches) can not be rendered properly in matplotlib’s 2D rendering engine.
This problem will likely not be solved until OpenGL support is added to all of the backends (patches are greatly welcomed). Until then, if you need complex 3D scenes, we recommend using MayaVi.

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

Heatmap colorbars accumulating in Matplotlib/Seaborn figures

I have a list of data frames, and I want to make heatmaps of every data frame in the list. The first heatmap comes out perfectly, but the second one has two colorbars, one much larger than the other, which distorts the figure. The third has THREE colorbars, the last one being even larger, and this continues for as many heatmaps as I make.
This seems like a bug to me, as I have no idea why it's happening. Each heatmap should be stored as a separate element in the list of heatmaps, and even if I plot them individually, instead of using a loop or list comprehension, I get the same problem.
Here is my code:
# Set the seaborn font size.
sns.set(font_scale=0.5)
# Ensure that labels are not cut off.
plt.gcf().subplots_adjust(bottom=0.18)
plt.gcf().subplots_adjust(right=.3)
black_yellow = sns.dark_palette("yellow",10)
heatmap_list = [sns.heatmap(df, cmap=black_yellow, xticklabels=True, yticklabels=True) for df in df_list]
[heatmap_list[x].figure.savefig(file_names_list[x]+'.pdf', format='pdf') for x in range(0,len(heatmap_list))]
sns.heatmap() creates a problem while we are working in loop. To resolve this issue, the first iteration will be done individually and rest of the loop remains the same but we will add a parameter cbar=False to stop this recursion of colorbar in the loop portion.
# Set the seaborn font size.
sns.set(font_scale=0.5)
# Ensure that labels are not cut off.
plt.gcf().subplots_adjust(bottom=0.18)
plt.gcf().subplots_adjust(right=.3)
black_yellow = sns.dark_palette("yellow", 10)
hm = sns.heatmap(df_list[0], cmap=black_yellow, xticklabels=True, yticklabels=True)
hm.figure.savefig(file_names_list[0]+'.pdf', format='pdf')
heatmap_list = [sns.heatmap(df_list[i], cmap=black_yellow, xticklabels=True, yticklabels=True, cbar=False) for i in range(1, len(df_list))]
[heatmap_list[x].figure.savefig(file_names_list[x+1]+'.pdf', format='pdf') for x in range(0, len(heatmap_list))]

Matplotlib video creation

EDIT: ImportanceOfBeingErnest provided the answer, however I am still inviting you all to explain, why is savefig logic different from animation logic.
I want to make a video in matplotlib. I went through manuals and examples and I just don't get it. (regarding matplotlib, I always copy examples, because after five years of python and two years of mathplotlib I still understand 0.0% of matplotlib syntax)
After half a dozen hours here is what I came up to. Well, I get empty video. No idea why.
import os
import math
import matplotlib
matplotlib.use("Agg")
from matplotlib import pyplot as plt
import matplotlib.animation as animation
# Set up formatting for the movie files
Writer = animation.writers['ffmpeg']
writer = Writer(fps=15, metadata=dict(artist='Me'), bitrate=1800)
numb=100
temp=[0.0]*numb
cont=[0.0]*numb
for i in range(int(4*numb/10),int(6*numb/10)):
temp[i]=2
cont[i]=2
fig = plt.figure()
plts=fig.add_subplot(1,1,1)
plts.set_ylim([0,2.1])
plts.get_xaxis().set_visible(False)
plts.get_yaxis().set_visible(False)
ims = []
for i in range(1,10):
line1, = plts.plot(range(0,numb),temp, linewidth=1, color='black')
line2, = plts.plot(range(0,numb),cont, linewidth=1, color='red')
# savefig is here for testing, works perfectly!
# fig.savefig('test'+str(i)+'.png', bbox_inches='tight', dpi=300)
ims.append([line1,line2])
plts.lines.remove(line1)
plts.lines.remove(line2)
for j in range(1,10):
tempa=0
for k in range (1,numb-1):
tempb=temp[k]+0.51*(temp[k-1]-2*temp[k]+temp[k+1])
temp[k-1]=tempa
tempa=tempb
temp[numb-1]=0
for j in range(1,20):
conta=0
for k in range (1,numb-1):
contb=cont[k]+0.255*(cont[k-1]-2*cont[k]+cont[k+1])
cont[k-1]=conta
conta=contb
cont[numb-1]=0
im_ani = animation.ArtistAnimation(fig, ims, interval=50, repeat_delay=3000,blit=True)
im_ani.save('im.mp4', writer=writer)
Can someone help me with this?
If you want to have a plot which is not empty, the main idea would be not to remove the lines from the plot.
That is, delete the two lines
plts.lines.remove(line1)
plts.lines.remove(line2)
If you delete these two lines the output will look something like this
[Link to orginial size animation]
Now one might ask, why do I not need to remove the artist in each iteration step, as otherwise all of the lines would populate the canvas at once?
The answer is that the ArtistAnimation takes care of this. It will only show those artists in the supplied list that correspond to the given time step. So while at the end of the for loop you end up with all the lines drawn to the canvas, once the animation starts they will all be removed and only one set of artists is shown at a time.
In such a case it is of course not a good idea to use the loop for saving the individual images as the final image would contain all of the drawn line at once,
The solution is then either to make two runs of the script, one for the animation, and one where the lines are removes in each timestep. Or, maybe better, use the animation istself to create the images.
im_ani.save('im.png', writer="imagemagick")
will create the images as im-<nr>.png in the current folder. It will require to have imagemagick installed.
I'm trying here to answer the two questions from the comments:
1. I have appended line1 and line2 before deleting them. Still they disappeared in the final result. How come?
You have appended the lines to a list. After that you removed the lines from the axes. Now the lines are in the list but not part of the axes. When doing the animation, matplotlib finds the lines in the list and makes them visible. But they are not in the axes (because they have been removed) so the visibility of some Line2D object, which does not live in any axes but only somewhere in memory, is changed. But that isn't reflected in the plot because the plot doesn't know this line any more.
2. If I understand right, when you issue line1, = plts.plot... command then the line1 plot object is added to the plts graph object. However, if you change the line1 plot object by issuing line1, = plts.plot... command again, matplotlib does change line1 object but before that saves the old line1 to the plts graph object permanently. Is this what caused my problem?
No. The first step is correct, line1, = plts.plot(..) adds a Line2D object to the axes. However, in a later loop step line1, = plts.plot() creates another Line2D object and puts it to the canvas. The initial Line2D object is not changed and it doesn't know that there is now some other line next to it in the plot. Therefore, if you don't remove the lines they will all be visible in the static plot at the end.

Dotted line style from non-evenly distributed data

I'm new to Python and MatPlotlib.
This is my first posting to Stackoverflow - I've been unable to find the answer elsewhere and would be grateful for your help.
I'm using Windows XP, with Enthought Canopy v1.1.1 (32 bit).
I want to plot a dotted-style linear regression line through a scatter plot of data, where both x and y arrays contain random floating point data.
The dots in the resulting dotted line are not distributed evenly along the regression line, and are "smeared together" in the middle of the red line, making it look messy (see upper plot resulting from attached minimal example code).
This does not seem to occur if the items in the array of x values are evenly distributed (lower plot).
I'm therefore guessing that this is an issue with how MatplotLib renders dotted lines, or with how Canopy interfaces Python with Matplotlib.
Please could you tell me a workaround which will make the dots on the dotted line type appear evenly distributed; even if both x and y data are non-evenly distributed; whilst still using Canopy and Matplotlib?
(As a general point, I'm always keen to improve my coding skills - if any code in my example can be written more neatly or concisely, I'd be grateful for your expertise).
Many thanks in anticipation
Dave
(UK)
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
#generate data
x1=10 * np.random.random_sample((40))
x2=np.linspace(0,10,40)
y=5 * np.random.random_sample((40))
slope, intercept, r_value, p_value, std_err = stats.linregress(x1,y)
line = (slope*x1)+intercept
plt.figure(1)
plt.subplot(211)
plt.scatter(x1,y,color='blue', marker='o')
plt.plot(x1,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
slope, intercept, r_value, p_value, std_err = stats.linregress(x2,y)
line = (slope*x2)+intercept
plt.subplot(212)
plt.scatter(x2,y,color='blue', marker='o')
plt.plot(x2,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
plt.show()
Welcome to SO.
You have already identified the problem yourself, but seem a bit surprised that a random x-array results in the line be 'cluttered'. But you draw a dotted line repeatedly over the same location, so it seems like the normal behavior to me that it gets smeared at places where there are multiple dotted lines on top of each other.
If you don't want that, you can sort your array and use that to calculate the regression line and plot it. Since its a linear regression, just using the min and max values would also work.
x1_sorted = np.sort(x1)
line = (slope * x1_sorted) + intercept
or
x1_extremes = np.array([x1.min(),x1.max()])
line = (slope * x1_extremes) + intercept
The last should be faster if x1 becomes very large.
With regard to your last comment. In your example you use whats called the 'state-machine' environment for plotting. It means that specified commands are applied to the active figure and the active axes (subplots).
You can also consider the OO approach where you get figure and axes objects. This means you can access any figure or axes at any time, not just the active one. Its useful when passing an axes to a function for example.
In your example both would work equally well and it would be more a matter of taste.
A small example:
# create a figure with 2 subplots (2 rows, 1 column)
fig, axs = plt.subplots(2,1)
# plot in the first subplots
axs[0].scatter(x1,y,color='blue', marker='o')
axs[0].plot(x1,line,'r:',label="Regression Line")
# plot in the second
axs[1].plot()
etc...