Add a new axis to the right/left/top-right of an axis - matplotlib

How do you add an axis to the outside of another axis, keeping it within the figure as a whole? legend and colorbar both have this capability, but implemented in rather complicated (and for me, hard to reproduce) ways.

You can use the subplots command to achieve this, this can be as simple as py.subplot(2,2,1) where the first two numbers describe the geometry of the plots (2x2) and the third is the current plot number. In general it is better to be explicit as in the following example
import pylab as py
# Make some data
x = py.linspace(0,10,1000)
cos_x = py.cos(x)
sin_x = py.sin(x)
# Initiate a figure, there are other options in addition to figsize
fig = py.figure(figsize=(6,6))
# Plot the first set of data on ax1
ax1 = fig.add_subplot(2,1,1)
ax1.plot(x,sin_x)
# Plot the second set of data on ax2
ax2 = fig.add_subplot(2,1,2)
ax2.plot(x,cos_x)
# This final line can be used to adjust the subplots, if uncommentted it will remove all white space
#fig.subplots_adjust(left=0.13, right=0.9, top=0.9, bottom=0.12,hspace=0.0,wspace=0.0)
Notice that this means things like py.xlabel may not work as expected since you have two axis. Instead you need to specify ax1.set_xlabel("..") this makes the code easier to read.
More examples can be found here.

Related

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

Scatter plot without x-axis

I am trying to visualize some data and have built a scatter plot with this code -
sns.regplot(y="Calls", x="clientid", data=Drop)
This is the output -
I don't want it to consider the x-axis. I just want to see how the data lie w.r.t y-axis. Is there a way to do that?
As #iayork suggested, you can see the distribution of your points with a striplot or a swarmplot (you could also combine them with a violinplot). If you need to move the points closer to the y-axis, you can simply adjust the size of the figure so that the width is small compared to the height (here i'm doing 2 subplots on a 4x5 in figure, which means that each plot is roughly 2x5 in).
fig, (ax1,ax2) = plt.subplots(1,2, figsize=(4,5))
sns.stripplot(d, orient='vert', ax=ax1)
sns.swarmplot(d, orient='vert', ax=ax2)
plt.tight_layout()
However, I'm going to suggest that maybe you want to use distplot instead. This function is specifically created to show the distribution of you data. Here i'm plotting the KDE of the data, as well as the "rugplot", which shows the position of the points along the y-axis:
fig = plt.figure()
sns.distplot(d, kde=True, vertical=True, rug=True, hist=False, kde_kws=dict(shade=True), rug_kws=dict(lw=2, color='orange'))

Second Matplotlib figure doesn't save to file

I've drawn a plot that looks something like the following:
It was created using the following code:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
# 1. Plot a figure consisting of 3 separate axes
# ==============================================
plotNames = ['Plot1','Plot2','Plot3']
figure, axisList = plt.subplots(len(plotNames), sharex=True, sharey=True)
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range('2015-01-01','2015-12-31',freq='D')
tempDF['value'] = np.random.randn(tempDF['date'].size)
tempDF['value2'] = np.random.randn(tempDF['date'].size)
for i in range(len(plotNames)):
axisList[i].plot_date(tempDF['date'],tempDF['value'],'b-',xdate=True)
# 2. Create a new single axis in the figure. This new axis sits over
# the top of the axes drawn previously. Make all the components of
# the new single axis invisibe except for the x and y labels.
big_ax = figure.add_subplot(111)
big_ax.set_axis_bgcolor('none')
big_ax.set_xlabel('Date',fontweight='bold')
big_ax.set_ylabel('Random normal',fontweight='bold')
big_ax.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
big_ax.spines['right'].set_visible(False)
big_ax.spines['top'].set_visible(False)
big_ax.spines['left'].set_visible(False)
big_ax.spines['bottom'].set_visible(False)
# 3. Plot a separate figure
# =========================
figure2,ax2 = plt.subplots()
ax2.plot_date(tempDF['date'],tempDF['value2'],'-',xdate=True,color='green')
ax2.set_xlabel('Date',fontweight='bold')
ax2.set_ylabel('Random normal',fontweight='bold')
# Save plot
# =========
plt.savefig('tempPlot.png',dpi=300)
Basically, the rationale for plotting the whole picture is as follows:
Create the first figure and plot 3 separate axes using a loop
Plot a single axis in the same figure to sit on top of the graphs
drawn previously. Label the x and y axes. Make all other aspects of
this axis invisible.
Create a second figure and plot data on a single axis.
The plot displays just as I want when using jupyter-notebook but when the plot is saved, the file contains only the second figure.
I was under the impression that plots could have multiple figures and that figures could have multiple axes. However, I suspect I have a fundamental misunderstanding of the differences between plots, subplots, figures and axes. Can someone please explain what I'm doing wrong and explain how to get the whole image to save to a single file.
Matplotlib does not have "plots". In that sense,
plots are figures
subplots are axes
During runtime of a script you can have as many figures as you wish. Calling plt.save() will save the currently active figure, i.e. the figure you would get by calling plt.gcf().
You can save any other figure either by providing a figure number num:
plt.figure(num)
plt.savefig("output.png")
or by having a refence to the figure object fig1
fig1.savefig("output.png")
In order to save several figures into one file, one could go the way detailed here: Python saving multiple figures into one PDF file.
Another option would be not to create several figures, but a single one, using subplots,
fig = plt.figure()
ax = plt.add_subplot(611)
ax2 = plt.add_subplot(612)
ax3 = plt.add_subplot(613)
ax4 = plt.add_subplot(212)
and then plot the respective graphs to those axes using
ax.plot(x,y)
or in the case of a pandas dataframe df
df.plot(x="column1", y="column2", ax=ax)
This second option can of course be generalized to arbitrary axes positions using subplots on grids. This is detailed in the matplotlib user's guide Customizing Location of Subplot Using GridSpec
Furthermore, it is possible to position an axes (a subplot so to speak) at any position in the figure using fig.add_axes([left, bottom, width, height]) (where left, bottom, width, height are in figure coordinates, ranging from 0 to 1).

Matplotlib annotate doesn't work on log scale?

I am making log-log plots for different data sets and need to include the best fit line equation. I know where in the plot I should place the equation, but since the data sets have very different values, I'd like to use relative coordinates in the annotation. (Otherwise, the annotation would move for every data set.)
I am aware of the annotate() function of matplotlib, and I know that I can use textcoords='axes fraction' to enable relative coordinates. When I plot my data on the regular scale, it works. But then I change at least one of the scales to log and the annotation disappears. I get no error message.
Here's my code:
plt.clf()
samplevalues = [100,1000,5000,10^4]
ax = plt.subplot(111)
ax.plot(samplevalues,samplevalues,'o',color='black')
ax.annotate('hi',(0.5,0.5), textcoords='axes fraction')
ax.set_xscale('log')
ax.set_yscale('log')
plt.show()
If I comment out ax.set_xcale('log') and ax.set_ycale('log'), the annotation appears right in the middle of the plot (where it should be). Otherwise, it doesn't appear.
Thanks in advance for your help!
It may really be a bug as pointed out by #tcaswell in the comment but a workaround is to use text() in axis coords:
plt.clf()
samplevalues = [100,1000,5000,10^4]
ax = plt.subplot(111)
ax.loglog(samplevalues,samplevalues,'o',color='black')
ax.text(0.5, 0.5,'hi',transform=ax.transAxes)
plt.show()
Another approach is to use figtext() but that is more cumbersome to use if there are already several plots (panels).
By the way, in the code above, I plotted the data using log-log scale directly. That is, instead of:
ax.plot(samplevalues,samplevalues,'o',color='black')
ax.set_xscale('log')
ax.set_yscale('log')
I did:
ax.loglog(samplevalues,samplevalues,'o',color='black')

Multiplot with matplotlib without knowing the number of plots before running

I have a problem with Matplotlib's subplots. I do not know the number of subplots I want to plot beforehand, but I know that I want them in two rows. so I cannot use
plt.subplot(212)
because I don't know the number that I should provide.
It should look like this:
Right now, I plot all the plots into a folder and put them together with illustrator, but there has to be a better way with Matplotlib. I can provide my code if I was unclear somewhere.
My understanding is that you only know the number of plots at runtime and hence are struggling with the shorthand syntax, e.g.:
plt.subplot(121)
Thankfully, to save you having to do some awkward math to figure out this number programatically, there is another interface which allows you to use the form:
plt.subplot(n_cols, n_rows, plot_num)
So in your case, given you want n plots, you can do:
n_plots = 5 # (or however many you programatically figure out you need)
n_cols = 2
n_rows = (n_plots + 1) // n_cols
for plot_num in range(n_plots):
ax = plt.subplot(n_cols, n_rows, plot_num)
# ... do some plotting
Alternatively, there is also a slightly more pythonic interface which you may wish to be aware of:
fig, subplots = plt.subplots(n_cols, n_rows)
for ax in subplots:
# ... do some plotting
(Notice that this was subplots() not the plain subplot()). Although I must admit, I have never used this latter interface.
HTH