Auto-resize Figure in Seaborn - pandas

I am looking for some option to automatically resize the figures that I am generating using seaborn (barplots, countplot, boxplot). I am creating all the plots in one shot, but the issue is, in some of the graphs labels & bars are tightly packed because some of the columns have too many categorical values. I am using the below code:
for col in dff.drop(target_col_name, axis=1).columns:
if ((dff[col].nunique() / len(dff[col])) < threshold):
ax = sns.countplot(x=dff[col], hue= dff[target_col_name] )
ax.set_xticklabels(ax.get_xticklabels(), rotation = 90)
plt.tight_layout()
plt.show()
pd.crosstab(index = dff[col],
columns = dff[target_col_name], normalize = 'index').plot.bar()
plt.tight_layout()
plt.show()
elif (dff[col].dtype == 'int64' or dff[col].dtype == 'float64'):
sns.boxplot(dff[target_col_name], dff[col])
One solution is to increase all the figsize for all figures or use another if condition to target specific columns that have more categorical values and increase the size of those figures.
But I am looking for a more flexible solution so that all the figures get resized automatically based on the information in them.

I have used a plotly in-built function "figure()" that you can use to alter the size of charts. All you need do is declare it right before the code for your chats.
For instance, plt.figure(figsize=(12,5)) alters the height and width of the chart to 12 and 5 respectively.

Related

Is it possible to break x and y axis at the same time on lineplot?

I am working on drawing lineplots with matplotlib.
I checked several posts and could understand how the line break works on matplotlib (Break // in x axis of matplotlib)
However, I was wondering is it possible to break x and y axis all together at the same time.
My current drawing looks like below.
As shown on the graph, x-axis [2000,5000] waste spaces a lot.
Because I have more data that need to be drawn after 7000, I want to save more space.
Is it possible to split x-axis together with y-axis?
Or is there another convenient way to not to show specific region on lineplot?
If there is another library enabling this, I am willing to drop matplotlib and adopt others...
Maybe splitting the axis isn't your best choice. I would perhaps try inserting another smaller figure into the open space of your large figure using add_axes(). Here is a small example.
t = np.linspace(0, 5000, 1000) # create 1000 time stamps
data = 5*t*np.exp(-t/100) # and some fake data
fig, ax = plt.subplots()
ax.plot(t, data)
box = ax.get_position()
width = box.width*0.6
height = box.height*0.6
x = 0.35
y = 0.35
subax = fig.add_axes([x,y,width,height])
subax.plot(t, data)
subax.axis([0, np.max(t)/10, 0, np.max(data)*1.1])
plt.show()

Add axes to a figure with a fixed size

I would like to create a figure where subplots are added dynamically within a for-loop. It should be possible to define the width and height of each subplot in centimeters, that is, the more subplots are added, the bigger the figure needs to be to make room for 'incoming' subplots.
In my case, subplots should be added row-wise so that the figure has to get bigger in the y-dimension. I came across this stackoverflow post, which might lead in the right direction? Maybe also the gridspec module could solve this problem?
I tried out the code as described in the first post, but this couldn't solve my problem (it sets the final figure size, but the more subplots are added to the figure the smaller each subplot gets, as shown in this example):
import matplotlib.pyplot as plt
# set number of plots
n_subplots = 2
def set_size(w,h,ax=None):
""" w, h: width, height in inches """
if not ax: ax=plt.gca()
l = ax.figure.subplotpars.left
r = ax.figure.subplotpars.right
t = ax.figure.subplotpars.top
b = ax.figure.subplotpars.bottom
figw = float(w)/(r-l)
figh = float(h)/(t-b)
ax.figure.set_size_inches(figw, figh)
fig = plt.figure()
for idx in range(0,n_subplots):
ax = fig.add_subplot(n_subplots,1,idx+1)
ax.plot([1,3,2])
set_size(5,5,ax=ax)
plt.show()
You're setting the same figure size (5,5) regardless of the number of subplots. If I understood your question correctly, I think you want to set the height to be proportional to the number of subplots.
However, you'd be better off to create the figure with the right size from the get-go. The code that you are providing gives the correct layout only because you know before hand how many subplots your going to create (in fig.add_subplot(n_subplots,...)). If you are trying to add subplots without knowing the total number of subplot rows you need, the problem is more complicated.
n_subplots = 4
ax_w = 5
ax_h = 5
dpi = 100
fig = plt.figure(figsize=(ax_w, ax_h), dpi=dpi)
for idx in range(0,n_subplots):
ax = fig.add_subplot(n_subplots,1,idx+1)
ax.plot([1,3,2])
fig.set_size_inches(ax_w,ax_h*n_subplots)
fig.tight_layout()

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

colorbars for grid of line (not contour) plots in matplotlib

I'm having trouble giving colorbars to a grid of line plots in Matplotlib.
I have a grid of plots, which each shows 64 lines. The lines depict the penalty value vs time when optimizing the same system under 64 different values of a certain hyperparameter h.
Since there are so many lines, instead of using a standard legend, I'd like to use a colorbar, and color the lines by the value of h. In other words, I'd like something that looks like this:
The above was done by adding a new axis to hold the colorbar, by calling figure.add_axes([0.95, 0.2, 0.02, 0.6]), passing in the axis position explicitly as parameters to that method. The colorbar was then created as in the example code here, by instantiating a ColorbarBase(). That's fine for single plots, but I'd like to make a grid of plots like the one above.
To do this, I tried doubling the number of subplots, and using every other subplot axis for the colorbar. Unfortunately, this led to the colorbars having the same size/shape as the plots:
Is there a way to shrink just the colorbar subplots in a grid of subplots like the 1x2 grid above?
Ideally, it'd be great if the colorbar just shared the same axis as the line plot it describes. I saw that the colorbar.colorbar() function has an ax parameter:
ax
parent axes object from which space for a new colorbar axes will be stolen.
That sounds great, except that colorbar.colorbar() requires you to pass in a imshow image, or a ContourSet, but my plot is neither an image nor a contour plot. Can I achieve the same (axis-sharing) effect using ColorbarBase?
It turns out you can have different-shaped subplots, so long as all the plots in a given row have the same height, and all the plots in a given column have the same width.
You can do this using gridspec.GridSpec, as described in this answer.
So I set the columns with line plots to be 20x wider than the columns with color bars. The code looks like:
grid_spec = gridspec.GridSpec(num_rows,
num_columns * 2,
width_ratios=[20, 1] * num_columns)
colormap_type = cm.cool
for (x_vec_list,
y_vec_list,
color_hyperparam_vec,
plot_index) in izip(x_vec_lists,
y_vec_lists,
color_hyperparam_vecs,
range(len(x_vecs))):
line_axis = plt.subplot(grid_spec[grid_index * 2])
colorbar_axis = plt.subplot(grid_spec[grid_index * 2 + 1])
colormap_normalizer = mpl.colors.Normalize(vmin=color_hyperparam_vec.min(),
vmax=color_hyperparam_vec.max())
scalar_to_color_map = mpl.cm.ScalarMappable(norm=colormap_normalizer,
cmap=colormap_type)
colorbar.ColorbarBase(colorbar_axis,
cmap=colormap_type,
norm=colormap_normalizer)
for (line_index,
x_vec,
y_vec) in zip(range(len(x_vec_list)),
x_vec_list,
y_vec_list):
hyperparam = color_hyperparam_vec[line_index]
line_color = scalar_to_color_map.to_rgba(hyperparam)
line_axis.plot(x_vec, y_vec, color=line_color, alpha=0.5)
For num_rows=1 and num_columns=1, this looks like:

Reducing the distance between two boxplots

I'm drawing the bloxplot shown below using python and matplotlib. Is there any way I can reduce the distance between the two boxplots on the X axis?
This is the code that I'm using to get the figure above:
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['ytick.direction'] = 'out'
rcParams['xtick.direction'] = 'out'
fig = plt.figure()
xlabels = ["CG", "EG"]
ax = fig.add_subplot(111)
ax.boxplot([values_cg, values_eg])
ax.set_xticks(np.arange(len(xlabels))+1)
ax.set_xticklabels(xlabels, rotation=45, ha='right')
fig.subplots_adjust(bottom=0.3)
ylabels = yticks = np.linspace(0, 20, 5)
ax.set_yticks(yticks)
ax.set_yticklabels(ylabels)
ax.tick_params(axis='x', pad=10)
ax.tick_params(axis='y', pad=10)
plt.savefig(os.path.join(output_dir, "output.pdf"))
And this is an example closer to what I'd like to get visually (although I wouldn't mind if the boxplots were even a bit closer to each other):
You can either change the aspect ratio of plot or use the widths kwarg (doc) as such:
ax.boxplot([values_cg, values_eg], widths=1)
to make the boxes wider.
Try changing the aspect ratio using
ax.set_aspect(1.5) # or some other float
The larger then number, the narrower (and taller) the plot should be:
a circle will be stretched such that the height is num times the width. aspect=1 is the same as aspect=’equal’.
http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.set_aspect
When your code writes:
ax.set_xticks(np.arange(len(xlabels))+1)
You're putting the first box plot on 0 and the second one on 1 (event though you change the tick labels afterwards), just like in the second, "wanted" example you gave they are set on 1,2,3.
So i think an alternative solution would be to play with the xticks position and the xlim of the plot.
for example using
ax.set_xlim(-1.5,2.5)
would place them closer.
positions : array-like, optional
Sets the positions of the boxes. The ticks and limits are automatically set to match the positions. Defaults to range(1, N+1) where N is the number of boxes to be drawn.
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.boxplot.html
This should do the job!
As #Stevie mentioned, you can use the positions kwarg (doc) to manually set the x-coordinates of the boxes:
ax.boxplot([values_cg, values_eg], positions=[1, 1.3])