How to gain control over annotate arrows - matplotlib

I'm trying to insert arrows (brackets) in plots using the annotate package, but I cannot figure out what the input parameters mean. I read the documentation and I'm still unsure of how to control the arrows. Here's an example starting point:
import matplotlib.pyplot as pl
import numpy as np
fig = pl.figure(figsize=(3.25, 2.5))
ax0 = fig.add_subplot(111)
x, y = np.arange(10), np.arange(10) * -1
for offset in range(5):
ax0.plot(x + offset, y, lw=1)
# add annotation arrow
bbox = dict(facecolor="w",
alpha=0.95,
ls="None",
boxstyle="round",
pad=0.1)
ax0.annotate(text="Example",
xy=(7.5, -5),
xytext=(0, -9),
arrowprops=dict(arrowstyle="-[",
linewidth=1,
connectionstyle="arc,armA=90,angleA=0,angleB=-40,armB=85,rad=0"),
verticalalignment="bottom",
horizontalalignment="left",
fontsize=8,
bbox=bbox)
fig.show()
I want the bracket to span the width of all of the drawn lines (as if to say "these" lines are what the annotation refers to), but I cannot figure out how to change the bracket width.
Another issue is interpreting armA and armB (the arrow lines currently look ugly). I understand these refer to the length of line segments, but I cannot figure out what the units are (pixels?), much less how to automate generating their lengths.
Can you please provide guidance on how to adjust the width of the bracket and what the connectionstyle parameters mean? If this is documented somewhere I would appreciate the reference (even if it comes with a RTFM-type comment).

I think the parameter you want is mutation_scale.
I changed your annotate command to this and I think it looks reasonable now, but it took some manual adjustment. If you had a consistent pattern in multiple figures you could probably calculate the angles and lengths that you want and use them as inputs, but for your example this seems to work reasonably well.
ax0.annotate(text="Example",
xy=(8.5, -6.5),
xytext=(0, -9),
arrowprops=dict(arrowstyle="-[",
linewidth=1,
mutation_scale=22,
connectionstyle="arc,armA=70, \
angleA=0, \
angleB=-45, \
armB=50, \
rad=0"), \
verticalalignment="bottom",
horizontalalignment="left",
fontsize=8,
bbox=bbox)

Related

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

How do I use colourmaps with variable alpha in a Seaborn kdeplot without seeing the contour lines?

Python version: 3.6.4 (Anaconda on Windows)
Seaborn: 0.8.1
Matplotlib: 2.1.2
I'm trying to create a 2D Kernel Density plot using Seaborn but I want each step in the colourmap to have a different alpha value. I had a look at this question to create a matplotlib colourmap with alpha values: Add alpha to an existing matplotlib colormap.
I have a problem in that the lines between contours are visible. The result I get is here:
I thought that I had found the answer when I found this question: Hide contour linestroke on pyplot.contourf to get only fills. I tried the method outlined in the answer (using set_edgecolor("face") but it did not work in this case. That question also seemed to be related to vector graphics formats and I am just writing out a PNG.
Here is my script:
import numpy as np
import seaborn as sns
import matplotlib.colors as cols
import matplotlib.pyplot as plt
def alpha_cmap(cmap):
my_cmap = cmap(np.arange(cmap.N))
# Set a square root alpha.
x = np.linspace(0, 1, cmap.N)
my_cmap[:,-1] = x ** (0.5)
my_cmap = cols.ListedColormap(my_cmap)
return my_cmap
xs = np.random.uniform(size=100)
ys = np.random.uniform(size=100)
kplot = sns.kdeplot(data=xs, data2=ys,
cmap=alpha_cmap(plt.cm.viridis),
shade=True,
shade_lowest=False,
n_levels=30)
plt.savefig("example_plot.png")
Guided by some comments on this question I have tried some other methods that have been successful when this problem has come up. Based on this question (Matplotlib Contourf Plots Unwanted Outlines when Alpha < 1) I have tried altering the plot call to:
sns.kdeplot(data=xs, data2=ys,
cmap=alpha_cmap(plt.cm.viridis),
shade=True,
shade_lowest=False,
n_levels=30,
antialiased=True)
With antialiased=True the lines between contours are replaced by a narrow white line:
I have also tried an approach similar to this question - Pyplot pcolormesh confused when alpha not 1. This approach is based on looping over the PathCollections in kplot.collections and tuning the parameters of the edges so that they become invisible. I have tried adding this code and tweaking the linewidth -
for thing in kplot.collections:
thing.set_edgecolor("face")
thing.set_linewidth(0.01)
fig.canvas.draw()
This results in a mix of white and dark lines - .
I believe that I will not be able to tune the line width to make the lines disappear because of the variable width of the contour bands.
Using both methods (antialiasing + linewidth) makes this version, which looks cool but isn't quite what I want:
I also found this question - Changing Transparency of/Remove Contour Lines in Matplotlib
This one suggests overplotting a second plot with a different number of contour levels on the same axis, like:
kplot = sns.kdeplot(data=xs, data2=ys,
ax=ax,
cmap=alpha_cmap(plt.cm.viridis),
shade=True,
shade_lowest=False,
n_levels=30,
antialiased=True)
kplot = sns.kdeplot(data=xs, data2=ys,
ax=ax,
cmap=alpha_cmap(plt.cm.viridis),
shade=True,
shade_lowest=False,
n_levels=35,
antialiased=True)
This results in:
This is better, and almost works. The problem here is I need variable (and non-linear) alpha throughout the colourmap. The variable banding and lines seem to be a result of the combinations of alpha when contours are plotted over each other. I also still see some clear/white lines in the result.

Matplotlib annotate doesn't work on log scale?

I am making log-log plots for different data sets and need to include the best fit line equation. I know where in the plot I should place the equation, but since the data sets have very different values, I'd like to use relative coordinates in the annotation. (Otherwise, the annotation would move for every data set.)
I am aware of the annotate() function of matplotlib, and I know that I can use textcoords='axes fraction' to enable relative coordinates. When I plot my data on the regular scale, it works. But then I change at least one of the scales to log and the annotation disappears. I get no error message.
Here's my code:
plt.clf()
samplevalues = [100,1000,5000,10^4]
ax = plt.subplot(111)
ax.plot(samplevalues,samplevalues,'o',color='black')
ax.annotate('hi',(0.5,0.5), textcoords='axes fraction')
ax.set_xscale('log')
ax.set_yscale('log')
plt.show()
If I comment out ax.set_xcale('log') and ax.set_ycale('log'), the annotation appears right in the middle of the plot (where it should be). Otherwise, it doesn't appear.
Thanks in advance for your help!
It may really be a bug as pointed out by #tcaswell in the comment but a workaround is to use text() in axis coords:
plt.clf()
samplevalues = [100,1000,5000,10^4]
ax = plt.subplot(111)
ax.loglog(samplevalues,samplevalues,'o',color='black')
ax.text(0.5, 0.5,'hi',transform=ax.transAxes)
plt.show()
Another approach is to use figtext() but that is more cumbersome to use if there are already several plots (panels).
By the way, in the code above, I plotted the data using log-log scale directly. That is, instead of:
ax.plot(samplevalues,samplevalues,'o',color='black')
ax.set_xscale('log')
ax.set_yscale('log')
I did:
ax.loglog(samplevalues,samplevalues,'o',color='black')

Dotted line style from non-evenly distributed data

I'm new to Python and MatPlotlib.
This is my first posting to Stackoverflow - I've been unable to find the answer elsewhere and would be grateful for your help.
I'm using Windows XP, with Enthought Canopy v1.1.1 (32 bit).
I want to plot a dotted-style linear regression line through a scatter plot of data, where both x and y arrays contain random floating point data.
The dots in the resulting dotted line are not distributed evenly along the regression line, and are "smeared together" in the middle of the red line, making it look messy (see upper plot resulting from attached minimal example code).
This does not seem to occur if the items in the array of x values are evenly distributed (lower plot).
I'm therefore guessing that this is an issue with how MatplotLib renders dotted lines, or with how Canopy interfaces Python with Matplotlib.
Please could you tell me a workaround which will make the dots on the dotted line type appear evenly distributed; even if both x and y data are non-evenly distributed; whilst still using Canopy and Matplotlib?
(As a general point, I'm always keen to improve my coding skills - if any code in my example can be written more neatly or concisely, I'd be grateful for your expertise).
Many thanks in anticipation
Dave
(UK)
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
#generate data
x1=10 * np.random.random_sample((40))
x2=np.linspace(0,10,40)
y=5 * np.random.random_sample((40))
slope, intercept, r_value, p_value, std_err = stats.linregress(x1,y)
line = (slope*x1)+intercept
plt.figure(1)
plt.subplot(211)
plt.scatter(x1,y,color='blue', marker='o')
plt.plot(x1,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
slope, intercept, r_value, p_value, std_err = stats.linregress(x2,y)
line = (slope*x2)+intercept
plt.subplot(212)
plt.scatter(x2,y,color='blue', marker='o')
plt.plot(x2,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
plt.show()
Welcome to SO.
You have already identified the problem yourself, but seem a bit surprised that a random x-array results in the line be 'cluttered'. But you draw a dotted line repeatedly over the same location, so it seems like the normal behavior to me that it gets smeared at places where there are multiple dotted lines on top of each other.
If you don't want that, you can sort your array and use that to calculate the regression line and plot it. Since its a linear regression, just using the min and max values would also work.
x1_sorted = np.sort(x1)
line = (slope * x1_sorted) + intercept
or
x1_extremes = np.array([x1.min(),x1.max()])
line = (slope * x1_extremes) + intercept
The last should be faster if x1 becomes very large.
With regard to your last comment. In your example you use whats called the 'state-machine' environment for plotting. It means that specified commands are applied to the active figure and the active axes (subplots).
You can also consider the OO approach where you get figure and axes objects. This means you can access any figure or axes at any time, not just the active one. Its useful when passing an axes to a function for example.
In your example both would work equally well and it would be more a matter of taste.
A small example:
# create a figure with 2 subplots (2 rows, 1 column)
fig, axs = plt.subplots(2,1)
# plot in the first subplots
axs[0].scatter(x1,y,color='blue', marker='o')
axs[0].plot(x1,line,'r:',label="Regression Line")
# plot in the second
axs[1].plot()
etc...

Change figsize in matplotlib polar contourf

I am using the following example Example to create two polar contour subplots. When I create as the pdf there is a lot of white space which I want to remove by changing figsize.
I know how to change figsize usually but I am having difficulty seeing where to put it in this code example. Any guidance or hint would be greatly appreciated.
Many thanks!
import numpy as np
import matplotlib.pyplot as plt
#-- Generate Data -----------------------------------------
# Using linspace so that the endpoint of 360 is included...
azimuths = np.radians(np.linspace(0, 360, 20))
zeniths = np.arange(0, 70, 10)
r, theta = np.meshgrid(zeniths, azimuths)
values = np.random.random((azimuths.size, zeniths.size))
#-- Plot... ------------------------------------------------
fig, ax = plt.subplots(subplot_kw=dict(projection='polar'))
ax.contourf(theta, r, values)
plt.show()
Another way to do this would be to use the figsize kwarg in your call to plt.subplots.
fig, ax = plt.subplots(figsize=(6,6), subplot_kw=dict(projection='polar')).
Those values are in inches, by the way.
You can easily just put plt.figsize(x,y) at the beginning of the code, and it will work. plt.figsize changes the size of all future plots, not just the current plot.
However, I think your problem is not what you think it is. There tends to be quite a bit of whitespace in generated PDFs unless you change options around. I usually use
plt.savefig( 'name.pdf', bbox_inches='tight', pad_inches=0 )
This gives as little whitespace as possible. bbox_inches='tight' tries to make the bounding box as small as possible, while pad_inches sets how many inches of whitespace there should be padding it. In my case I have no extra padding at all, as I add padding in whatever I'm using the figure for.