Heatmap colorbars accumulating in Matplotlib/Seaborn figures - matplotlib

I have a list of data frames, and I want to make heatmaps of every data frame in the list. The first heatmap comes out perfectly, but the second one has two colorbars, one much larger than the other, which distorts the figure. The third has THREE colorbars, the last one being even larger, and this continues for as many heatmaps as I make.
This seems like a bug to me, as I have no idea why it's happening. Each heatmap should be stored as a separate element in the list of heatmaps, and even if I plot them individually, instead of using a loop or list comprehension, I get the same problem.
Here is my code:
# Set the seaborn font size.
sns.set(font_scale=0.5)
# Ensure that labels are not cut off.
plt.gcf().subplots_adjust(bottom=0.18)
plt.gcf().subplots_adjust(right=.3)
black_yellow = sns.dark_palette("yellow",10)
heatmap_list = [sns.heatmap(df, cmap=black_yellow, xticklabels=True, yticklabels=True) for df in df_list]
[heatmap_list[x].figure.savefig(file_names_list[x]+'.pdf', format='pdf') for x in range(0,len(heatmap_list))]

sns.heatmap() creates a problem while we are working in loop. To resolve this issue, the first iteration will be done individually and rest of the loop remains the same but we will add a parameter cbar=False to stop this recursion of colorbar in the loop portion.
# Set the seaborn font size.
sns.set(font_scale=0.5)
# Ensure that labels are not cut off.
plt.gcf().subplots_adjust(bottom=0.18)
plt.gcf().subplots_adjust(right=.3)
black_yellow = sns.dark_palette("yellow", 10)
hm = sns.heatmap(df_list[0], cmap=black_yellow, xticklabels=True, yticklabels=True)
hm.figure.savefig(file_names_list[0]+'.pdf', format='pdf')
heatmap_list = [sns.heatmap(df_list[i], cmap=black_yellow, xticklabels=True, yticklabels=True, cbar=False) for i in range(1, len(df_list))]
[heatmap_list[x].figure.savefig(file_names_list[x+1]+'.pdf', format='pdf') for x in range(0, len(heatmap_list))]

Related

Matplotlibs subplots: actual number of subplots unknown in advance

I'm trying to add subplots in a loop only under a certain condition, so the number of subplots is unknown in advance (although I know the limit would be the number of iterations). I tried the following:
fig, ax = plt.subplots(iterations_limit) followed by ax[i].remove() to remove the extra axes, this works but there is a blank space in the figure for the axes that got removed
Create a new subplot after the number of subplots become known, and copy the axes objects from ax. It seems that axes copying doesn't quite work either
How can I either remove the extra axes while also removing their assigned space in the figure OR only add subplots that I need without knowning their count in advance?

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

Efficiently Plotting Many Lines in VisPy

From all example code/demos I have seen in the VisPy library, I only see one way that people plot many lines, for example:
for i in range(N):
pos = pos.copy()
pos[:, 1] = np.random.normal(scale=5, loc=(i+1)*30, size=N)
line = scene.visuals.Line(pos=pos, color=color, parent=canvas.scene)
lines.append(line)
canvas.show()
My issue is that I have many lines to plot (each several hundred thousand points). Matplotlib proved too slow because of the total number of points plotted was in the millions, hence I switched to VisPy. But VisPy is even slower when you plot thousands of lines each with thousands of points (the speed-up comes when you have millions of points).
The root cause is in the way lines are drawn. When you create a plot widget and then plot a line, each line is rendered to the canvas. In matplotlib you can explicitly state to not show the canvas until all lines are drawn in memory, but there doesn't appear to be the same functionality in VisPy, making it useless.
Is there any way around this? I need to plot multiple lines so that I can change properties interactively, so flattening all the data points into one plot call won't work.
(I am using a PyQt4 to embed the plot in a GUI. I have also considered pyqtgraph.)
You should pass an array to the "connect" parameter of the Line() function.
xy = np.random.rand(5,2) # 2D positions
# Create an array of point connections :
toconnect = np.array([[0,1], [0,2], [1,4], [2,3], [2,4]])
# Point 0 in your xy will be connected with 1 and 2, point
# 1 with 4 and point 2 with 3 and 4.
line = scene.visuals.Line(pos=xy, connect=toconnect)
You only add one object to your canvas but the control pear line is more limited.

matplotlib pyplot side-by-side graphics

I'm trying to put two scatterplots side-by-side in the same figure. I'm also using prettyplotlib to make the graphs look a little nicer. Here is the code
fig, ax = ppl.subplots(ncols=2,nrows=1,figsize=(14,6))
for each in ['skimmer','dos','webapp','losstheft','espionage','crimeware','misuse','pos']:
ypos = df[df['pattern']==each]['ypos_m']
xpos = df[df['pattern']==each]['xpos_m']
ax[0] = ppl.scatter(ypos,xpos,label=each)
plt.title("Multi-dimensional Scaling: Manhattan")
for each in ['skimmer','dos','webapp','losstheft','espionage','crimeware','misuse','pos']:
ypos = df[df['pattern']==each]['ypos_e']
xpos = df[df['pattern']==each]['xpos_e']
ax[1] = ppl.scatter(ypos,xpos,label=each)
plt.title("Multi-dimensional Scaling: Euclidean")
plt.show()
I don't get any error when the code runs, but what I end up with is one row with two graphs. One graph is completely empty and not styled by prettyplotlib at all. The right side graphic seems to have both of my scatterplots in it.
I know that ppl.subplots is returning a matplotlib.figure.Figure and a numpy array consisting of two matplotlib.axes.AxesSubplot. But I also admit that I don't quite get how axes and subplotting works. Hopefully it's just a simple mistake somewhere.
I think ax[0] = ppl.scatter(ypos,xpos,label=each) should be ax[0].scatter(ypos,xpos,label=each) and ax[1] = ppl.scatter(ypos,xpos,label=each) should be ax[1].scatter(ypos,xpos,label=each), change those and see if your problem get solved.
I am quite sure that the issue is: you are calling ppl.scatter(...), which will try to draw on the current axis, which is the 1st axes of 2 axes you generated (and it is the left one)
Also you may find that in the end, the ax list contains two matplotlib.collections.PathCollections, bot two axis as you may expect.
Since the solution above removes the prettiness of prettyplot, we shall use an alternative solution, which is to change the current working axis, by adding:
plt.sca(ax[0_or_1])
Before ppl.scatter(), inside each loop.

Add a new axis to the right/left/top-right of an axis

How do you add an axis to the outside of another axis, keeping it within the figure as a whole? legend and colorbar both have this capability, but implemented in rather complicated (and for me, hard to reproduce) ways.
You can use the subplots command to achieve this, this can be as simple as py.subplot(2,2,1) where the first two numbers describe the geometry of the plots (2x2) and the third is the current plot number. In general it is better to be explicit as in the following example
import pylab as py
# Make some data
x = py.linspace(0,10,1000)
cos_x = py.cos(x)
sin_x = py.sin(x)
# Initiate a figure, there are other options in addition to figsize
fig = py.figure(figsize=(6,6))
# Plot the first set of data on ax1
ax1 = fig.add_subplot(2,1,1)
ax1.plot(x,sin_x)
# Plot the second set of data on ax2
ax2 = fig.add_subplot(2,1,2)
ax2.plot(x,cos_x)
# This final line can be used to adjust the subplots, if uncommentted it will remove all white space
#fig.subplots_adjust(left=0.13, right=0.9, top=0.9, bottom=0.12,hspace=0.0,wspace=0.0)
Notice that this means things like py.xlabel may not work as expected since you have two axis. Instead you need to specify ax1.set_xlabel("..") this makes the code easier to read.
More examples can be found here.