What does ax=ax do while creating a plot in matplotlib? - matplotlib

I have a DataFrame of Heart Disease patients, which has over 300 values. What I have done initially is filter the patients aging over 50. Now I am trying to plot that DF, but running on Google, I found this piece of code that helped me plotting it.
But I am not able to understand the concept of ax = ax here:
fig, ax = plt.subplots()
over_50.plot(x="age",
y="chol",
c="target",
kind="scatter",
---------> ax=ax); <---------
I want to learn the concept behind this little piece of code here. What is it doing at its core?

In this case (a single axes plot) you can do without this parameter.
But there are more complex cases, when you create subplots with
a number of axes objects (a grid).
In this case ax (the second result from plt.subplots()) is an array
of axes objects.
Then, creating each plot, you should specify in which axes this plot
is to be created.
See e.g. https://matplotlib.org/3.1.0/gallery/subplots_axes_and_figures/subplots_demo.html
and find title Stacking subplots in one direction.
It contains such example:
fig, axs = plt.subplots(2)
fig.suptitle('Vertically stacked subplots')
axs[0].plot(x, y)
axs[1].plot(x, -y)
Here:
there is created a figure composed of 2 columns,
in the first axes there is created one line plot, and in the second - another plot.
Alternative form of how to specify axes object in which particular plot
is to be created is just ax parameter, like in our code,
where you can pass one of axes objects from the current figure.

Related

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

Scatter plot without x-axis

I am trying to visualize some data and have built a scatter plot with this code -
sns.regplot(y="Calls", x="clientid", data=Drop)
This is the output -
I don't want it to consider the x-axis. I just want to see how the data lie w.r.t y-axis. Is there a way to do that?
As #iayork suggested, you can see the distribution of your points with a striplot or a swarmplot (you could also combine them with a violinplot). If you need to move the points closer to the y-axis, you can simply adjust the size of the figure so that the width is small compared to the height (here i'm doing 2 subplots on a 4x5 in figure, which means that each plot is roughly 2x5 in).
fig, (ax1,ax2) = plt.subplots(1,2, figsize=(4,5))
sns.stripplot(d, orient='vert', ax=ax1)
sns.swarmplot(d, orient='vert', ax=ax2)
plt.tight_layout()
However, I'm going to suggest that maybe you want to use distplot instead. This function is specifically created to show the distribution of you data. Here i'm plotting the KDE of the data, as well as the "rugplot", which shows the position of the points along the y-axis:
fig = plt.figure()
sns.distplot(d, kde=True, vertical=True, rug=True, hist=False, kde_kws=dict(shade=True), rug_kws=dict(lw=2, color='orange'))

Second Matplotlib figure doesn't save to file

I've drawn a plot that looks something like the following:
It was created using the following code:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
# 1. Plot a figure consisting of 3 separate axes
# ==============================================
plotNames = ['Plot1','Plot2','Plot3']
figure, axisList = plt.subplots(len(plotNames), sharex=True, sharey=True)
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range('2015-01-01','2015-12-31',freq='D')
tempDF['value'] = np.random.randn(tempDF['date'].size)
tempDF['value2'] = np.random.randn(tempDF['date'].size)
for i in range(len(plotNames)):
axisList[i].plot_date(tempDF['date'],tempDF['value'],'b-',xdate=True)
# 2. Create a new single axis in the figure. This new axis sits over
# the top of the axes drawn previously. Make all the components of
# the new single axis invisibe except for the x and y labels.
big_ax = figure.add_subplot(111)
big_ax.set_axis_bgcolor('none')
big_ax.set_xlabel('Date',fontweight='bold')
big_ax.set_ylabel('Random normal',fontweight='bold')
big_ax.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
big_ax.spines['right'].set_visible(False)
big_ax.spines['top'].set_visible(False)
big_ax.spines['left'].set_visible(False)
big_ax.spines['bottom'].set_visible(False)
# 3. Plot a separate figure
# =========================
figure2,ax2 = plt.subplots()
ax2.plot_date(tempDF['date'],tempDF['value2'],'-',xdate=True,color='green')
ax2.set_xlabel('Date',fontweight='bold')
ax2.set_ylabel('Random normal',fontweight='bold')
# Save plot
# =========
plt.savefig('tempPlot.png',dpi=300)
Basically, the rationale for plotting the whole picture is as follows:
Create the first figure and plot 3 separate axes using a loop
Plot a single axis in the same figure to sit on top of the graphs
drawn previously. Label the x and y axes. Make all other aspects of
this axis invisible.
Create a second figure and plot data on a single axis.
The plot displays just as I want when using jupyter-notebook but when the plot is saved, the file contains only the second figure.
I was under the impression that plots could have multiple figures and that figures could have multiple axes. However, I suspect I have a fundamental misunderstanding of the differences between plots, subplots, figures and axes. Can someone please explain what I'm doing wrong and explain how to get the whole image to save to a single file.
Matplotlib does not have "plots". In that sense,
plots are figures
subplots are axes
During runtime of a script you can have as many figures as you wish. Calling plt.save() will save the currently active figure, i.e. the figure you would get by calling plt.gcf().
You can save any other figure either by providing a figure number num:
plt.figure(num)
plt.savefig("output.png")
or by having a refence to the figure object fig1
fig1.savefig("output.png")
In order to save several figures into one file, one could go the way detailed here: Python saving multiple figures into one PDF file.
Another option would be not to create several figures, but a single one, using subplots,
fig = plt.figure()
ax = plt.add_subplot(611)
ax2 = plt.add_subplot(612)
ax3 = plt.add_subplot(613)
ax4 = plt.add_subplot(212)
and then plot the respective graphs to those axes using
ax.plot(x,y)
or in the case of a pandas dataframe df
df.plot(x="column1", y="column2", ax=ax)
This second option can of course be generalized to arbitrary axes positions using subplots on grids. This is detailed in the matplotlib user's guide Customizing Location of Subplot Using GridSpec
Furthermore, it is possible to position an axes (a subplot so to speak) at any position in the figure using fig.add_axes([left, bottom, width, height]) (where left, bottom, width, height are in figure coordinates, ranging from 0 to 1).

colorbars for grid of line (not contour) plots in matplotlib

I'm having trouble giving colorbars to a grid of line plots in Matplotlib.
I have a grid of plots, which each shows 64 lines. The lines depict the penalty value vs time when optimizing the same system under 64 different values of a certain hyperparameter h.
Since there are so many lines, instead of using a standard legend, I'd like to use a colorbar, and color the lines by the value of h. In other words, I'd like something that looks like this:
The above was done by adding a new axis to hold the colorbar, by calling figure.add_axes([0.95, 0.2, 0.02, 0.6]), passing in the axis position explicitly as parameters to that method. The colorbar was then created as in the example code here, by instantiating a ColorbarBase(). That's fine for single plots, but I'd like to make a grid of plots like the one above.
To do this, I tried doubling the number of subplots, and using every other subplot axis for the colorbar. Unfortunately, this led to the colorbars having the same size/shape as the plots:
Is there a way to shrink just the colorbar subplots in a grid of subplots like the 1x2 grid above?
Ideally, it'd be great if the colorbar just shared the same axis as the line plot it describes. I saw that the colorbar.colorbar() function has an ax parameter:
ax
parent axes object from which space for a new colorbar axes will be stolen.
That sounds great, except that colorbar.colorbar() requires you to pass in a imshow image, or a ContourSet, but my plot is neither an image nor a contour plot. Can I achieve the same (axis-sharing) effect using ColorbarBase?
It turns out you can have different-shaped subplots, so long as all the plots in a given row have the same height, and all the plots in a given column have the same width.
You can do this using gridspec.GridSpec, as described in this answer.
So I set the columns with line plots to be 20x wider than the columns with color bars. The code looks like:
grid_spec = gridspec.GridSpec(num_rows,
num_columns * 2,
width_ratios=[20, 1] * num_columns)
colormap_type = cm.cool
for (x_vec_list,
y_vec_list,
color_hyperparam_vec,
plot_index) in izip(x_vec_lists,
y_vec_lists,
color_hyperparam_vecs,
range(len(x_vecs))):
line_axis = plt.subplot(grid_spec[grid_index * 2])
colorbar_axis = plt.subplot(grid_spec[grid_index * 2 + 1])
colormap_normalizer = mpl.colors.Normalize(vmin=color_hyperparam_vec.min(),
vmax=color_hyperparam_vec.max())
scalar_to_color_map = mpl.cm.ScalarMappable(norm=colormap_normalizer,
cmap=colormap_type)
colorbar.ColorbarBase(colorbar_axis,
cmap=colormap_type,
norm=colormap_normalizer)
for (line_index,
x_vec,
y_vec) in zip(range(len(x_vec_list)),
x_vec_list,
y_vec_list):
hyperparam = color_hyperparam_vec[line_index]
line_color = scalar_to_color_map.to_rgba(hyperparam)
line_axis.plot(x_vec, y_vec, color=line_color, alpha=0.5)
For num_rows=1 and num_columns=1, this looks like:

matplotlib using twinx and twiny together (like twinxy)

Can I have both twinx and twiny together (i.e. something like twinxy)?
I want to put a CDF on a bar plot where the X axis of the bar plot is in log-scale. I cannot make the Ys together, because the bar plot y range is very large comparing [0,1] for CDF.
Any ideas?
Thanks,
If I understand your question right, you want to plot two things on the same axes with no shared axis. There is probably a better way to do this, but you can stack twinx (doc) and twiny (doc) as such
ax # your first axes
ax_new = ax.twinx().twiny()
Which will give you tick marks on all sides of the plot. ax will plot against the bottom and left, ax_new will plot against the top and right.