Create multiple columns and rows of unequal size in matplotlib - matplotlib

I need to create multiple columns and rows of unequal size in matplotlib. Here is a sample code:
a = np.random.rand(20, 20)
b = np.random.rand(20, 5)
c = np.random.rand(5, 20)
d = np.random.rand(5,5)
arrays = [a,b,c,d]
fig, axs = plt.subplots(2, 2, sharex='col', sharey= 'row', figsize=(10,10))
for ax, ar in zip(axs.flatten(), arrays):
ax.imshow(ar)
However, I get this as a result.
The right column has images of unequal width for the first and second row, and I would want them to be equal (basically shrink the bottom right image to have the same scale as other images).
I had researched this a fair amount, but nothing seems to work. I had tried tight_layout(), some other formatting tricks, all to no avail...

You can use the gridspec's height_ratios and width_ratios argument to set the desired proportion the subplots shall occupy.
In this case, due to the symmetry, this is simply the shape of e.g. b.
import numpy as np
import matplotlib.pyplot as plt
a = np.random.rand(20, 20)
b = np.random.rand(20, 5)
c = np.random.rand(5, 20)
d = np.random.rand(5,5)
arrays = [a,b,c,d]
fig, axs = plt.subplots(2, 2, sharex='col', sharey= 'row', figsize=(10,10),
gridspec_kw={"height_ratios" : b.shape,
"width_ratios" : b.shape})
for ax, ar in zip(axs.flatten(), arrays):
ax.imshow(ar)
plt.show()
Or, more generally,
gridspec_kw={"height_ratios" : [a.shape[0], c.shape[0]],
"width_ratios" : [a.shape[1], b.shape[1]]}

Related

change color of bar for data selection in seaborn histogram (or plt)

Let's say I have a dataframe like:
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)
a = pd.DataFrame({"X3": X3, "X2":X2})
and I am doing the following plotting routine:
f, axes = plt.subplots(2, 2, gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
sns.boxplot(a[c], ax=axes[0,i])
sns.distplot(a[c], ax = axes[1,i])
axes[1, i].set(yticklabels=[])
axes[1, i].set(xlabel='')
axes[1, i].set(ylabel='')
plt.tight_layout()
plt.show()
Which yields to:
Now I want to be able to perform a data selection on the dataframe a. Let's say something like:
b = a[(a['X2'] <4)]
and highlight the selection from b in the posted histograms.
for example if the first row of b is [32:0] for X3 and [0:5] for X2, the desired output would be:
is it possible to do this with the above for loop and with sns? Many thanks!
EDIT: I am also happy with a matplotlib solution, if easier.
EDIT2:
If it helps, it would be similar to do the following:
b = a[(a['X3'] >38)]
f, axes = plt.subplots(2, 2, gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
sns.boxplot(a[c], ax=axes[0,i])
sns.distplot(a[c], ax = axes[1,i])
sns.distplot(b[c], ax = axes[1,i])
axes[1, i].set(yticklabels=[])
axes[1, i].set(xlabel='')
axes[1, i].set(ylabel='')
plt.tight_layout()
plt.show()
which yields the following:
However, I would like to be able to just colour those bars in the first plot in a different colour!
I also thought about setting the ylim to only the size of the blue plot so that the orange won't distort the shape of the blue distribution, but it wouldn't still be feasible, as in reality I have about 10 histograms to show, and setting ylim would be pretty much the same as sharey=True, which Im trying to avoid, so that I'm able to show the true shape of the distributions.
I think I found the solution for this using the inspiration from the previous answer and this video:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(2021)
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)
a = pd.DataFrame({"X3": X3, "X2":X2})
b = a[(a['X3'] < 30)]
hist_idx=[]
for i, c in enumerate(a.columns):
bin_ = np.histogram(a[c], bins=20)[1]
hist = np.where(np.logical_and(bin_<=max(b[c]), bin_>min(b[c])))
hist_idx.append(hist)
f, axes = plt.subplots(2, 2, gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
sns.boxplot(a[c], ax=axes[0,i])
axes[1, i].hist(a[c], bins = 20)
axes[1, i].set(yticklabels=[])
axes[1, i].set(xlabel='')
axes[1, i].set(ylabel='')
for it, index in enumerate(hist_idx):
lenght = len(index[0])
for r in range(lenght):
try:
axes[1, it].patches[index[0][r]-1].set_fc("red")
except:
pass
plt.tight_layout()
plt.show()
which yields the following for b = a[(a['X3'] < 30)] :
or for b = a[(a['X3'] > 36)]:
Thought I'd leave it here - although niche, might help someone in the future!
I created the following code with the understanding that the intent of your question is to add a different color to the histogram based on the data extracted under certain conditions.
Use np.histogram() to get an array of frequencies and an array of bins. Get the index of the value closest to the value of the first row of data extracted for a certain condition. Change the color of the histogram with that retrieved index. The same method can be used to deal with the other graph.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(2021)
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)
a = pd.DataFrame({"X3": X3, "X2":X2})
f, axes = plt.subplots(2, 2, gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
sns.boxplot(a[c], ax=axes[0,i])
sns.distplot(a[c], ax = axes[1,i])
axes[1, i].set(yticklabels=[])
axes[1, i].set(xlabel='')
axes[1, i].set(ylabel='')
b = a[(a['X2'] <4)]
hist3, bins3 = np.histogram(X3)
idx = np.abs(np.asarray(hist3) - b['X3'].head(1).values[0]).argmin()
for k in range(idx):
axes[1,0].get_children()[k].set_color("red")
plt.tight_layout()
plt.show()

How to create a heatmap with marginal histograms, similar to a jointplot?

I want to plot 2-dimensional scalar data, which I would usually plot using matplotlib.pyplot.imshow or sns.heatmap. Consider this example:
data = [[10, 20, 30], [50, 50, 100], [80, 60, 10]]
fix, ax = plt.subplots()
ax.imshow(data, cmap=plt.cm.YlGn)
Now I additionally would like to have one-dimonsional bar plots at the top and the right side, showing the sum of the values in each column / row - just as sns.jointplot does. However, sns.jointplot seems only to work with categorical data, producing histograms (with kind='hist'), scatterplots or the like - I don't see how to use it if I want to specify the values of the cells directly. Is such a thing possible with seaborn?
The y axis in my plot is going to be days (within a month), the x axis is going to be hours. My data looks like this:
The field Cost Difference is what should make up the shade of the respective field in the plot.
Here is an approach that first creates a dummy jointplot and then uses its axes to add a heatmap and bar plots of the sums.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
D = 28
H = 24
df = pd.DataFrame({'day': np.repeat(range(1, D + 1), H),
'hour': np.tile(range(H), D),
'Cost Dif.': np.random.uniform(10, 1000, D * H)})
# change the random df to have some rows/columns stand out (debugging, checking)
df.loc[df['hour'] == 10, 'Cost Dif.'] = 150
df.loc[df['hour'] == 12, 'Cost Dif.'] = 250
df.loc[df['day'] == 20, 'Cost Dif.'] = 800
g = sns.jointplot(data=df, x='day', y='hour', kind='hist', bins=(D, H))
g.ax_marg_y.cla()
g.ax_marg_x.cla()
sns.heatmap(data=df['Cost Dif.'].to_numpy().reshape(D, H).T, ax=g.ax_joint, cbar=False, cmap='Blues')
g.ax_marg_y.barh(np.arange(0.5, H), df.groupby(['hour'])['Cost Dif.'].sum().to_numpy(), color='navy')
g.ax_marg_x.bar(np.arange(0.5, D), df.groupby(['day'])['Cost Dif.'].sum().to_numpy(), color='navy')
g.ax_joint.set_xticks(np.arange(0.5, D))
g.ax_joint.set_xticklabels(range(1, D + 1), rotation=0)
g.ax_joint.set_yticks(np.arange(0.5, H))
g.ax_joint.set_yticklabels(range(H), rotation=0)
# remove ticks between heatmao and histograms
g.ax_marg_x.tick_params(axis='x', bottom=False, labelbottom=False)
g.ax_marg_y.tick_params(axis='y', left=False, labelleft=False)
# remove ticks showing the heights of the histograms
g.ax_marg_x.tick_params(axis='y', left=False, labelleft=False)
g.ax_marg_y.tick_params(axis='x', bottom=False, labelbottom=False)
g.fig.set_size_inches(20, 8) # jointplot creates its own figure, the size can only be changed afterwards
# g.fig.subplots_adjust(hspace=0.3) # optionally more space for the tick labels
g.fig.subplots_adjust(hspace=0.05, wspace=0.02) # less spaced needed when there are no tick labels
plt.show()

Python keeps overwriting hist on previous plot but doesn't save it with the desired plot

I am saving two separate figures, that each should contain 2 plots together.
The problem is that the first figure is ok, but the second one, does not gets overwritten on the new plot but on the previous one, but in the saved figure, I only find one of the plots :
This is the first figure , and I get the first figure correctly :
import scipy.stats as s
import numpy as np
import os
import pandas as pd
import openpyxl as pyx
import matplotlib
matplotlib.rcParams["backend"] = "TkAgg"
#matplotlib.rcParams['backend'] = "Qt4Agg"
#matplotlib.rcParams['backend'] = "nbAgg"
import matplotlib.pyplot as plt
import math
data = [336256, 620316, 958846, 1007830, 1080401]
pdf = array([ 0.00449982, 0.0045293 , 0.00455894, 0.02397463,
0.02395788, 0.02394114])
fig, ax = plt.subplots();
fig = plt.figure(figsize=(40,30))
x = np.linspace(np.min(data), np.max(data), 100);
plt.plot(x, s.exponweib.pdf(x, *s.exponweib.fit(data, 1, 1, loc=0, scale=2)))
plt.hist(data, bins = np.linspace(data[0], data[-1], 100), normed=True, alpha= 1)
text1= ' Weibull'
plt.savefig(text1+ '.png' )
datar =np.asarray(data)
mu, sigma = datar.mean() , datar.std() # mean and standard deviation
normal_std = np.sqrt(np.log(1 + (sigma/mu)**2))
normal_mean = np.log(mu) - normal_std**2 / 2
hs = np.random.lognormal(normal_mean, normal_std, 1000)
print(hs.max()) # some finite number
print(hs.mean()) # about 136519
print(hs.std()) # about 50405
count, bins, ignored = plt.hist(hs, 100, normed=True)
x = np.linspace(min(bins), max(bins), 10000)
pdfT = [];
for el in range (len(x)):
pdfTmp = (math.exp(-(np.log(x[el]) - normal_mean)**2 / (2 * normal_std**2)))
pdfT += [pdfTmp]
pdf = np.asarray(pdfT)
This is the second set :
fig, ax = plt.subplots();
fig = plt.figure(figsize=(40,40))
plt.plot(x, pdf, linewidth=2, color='r')
plt.hist(data, bins = np.linspace(data[0], data[-1], 100), normed=True, alpha= 1)
text= ' Lognormal '
plt.savefig(text+ '.png' )
The first plot saves the histogram together with curve. instead the second one only saves the curve
update 1 : looking at This Question , I found out that clearing the plot history will help the figures don't mixed up , but still my second set of plots, I mean the lognormal do not save together, I only get the curve and not the histogram.
This is happening, because you have set normed = True, which means that area under the histogram is normalized to 1. And since your bins are very wide, this means that the actual height of the histogram bars are very small (in this case so small that they are not visible)
If you use
n, bins, _ = plt.hist(data, bins = np.linspace(data[0], data[-1], 100), normed=True, alpha= 1)
n will contain the y-value of your bins and you can confirm this yourself.
Also have a look at the documentation for plt.hist.
So if you set normed to False, the histogram will be visible.
Edit: number of bins
import numpy as np
import matplotlib.pyplot as plt
rand_data = np.random.uniform(0, 1.0, 100)
fig = plt.figure()
ax_1 = fig.add_subplot(211)
ax_1.hist(rand_data, bins=10)
ax_2 = fig.add_subplot(212)
ax_2.hist(rand_data, bins=100)
plt.show()
will give you two plots similar (since its random) to:
which shows how the number of bins changes the histogram.
A histogram visualises the distribution of your data along one dimension, so not sure what you mean by number of inputs and bins.

Multicolored graph based on data frame values

Im plotting chart based on the data frame as below., I want to show the graph line in different colour based on the column Condition. Im trying the following code but it shows only one colour throughout the graph.
df = pd.DataFrame(dict(
Day=pd.date_range('2018-01-01', periods = 60, freq='D'),
Utilisation = np.random.rand(60) * 100))
df = df.astype(dtype= {"Utilisation":"int64"})
df['Condition'] = np.where(df.Utilisation < 10, 'Winter',
np.where(df.Utilisation < 30, 'Summer', 'Spring'))
condition_map = {'Winter': 'r', 'Summer': 'k', 'Spring': 'b'}
df[['Utilisation','Day']].set_index('Day').plot(figsize=(10,4), rot=90,
color=df.Condition.map(condition_map))
So, I assume you want a graph for each condition.
I would use groupby to separate the data.
# Color setting
season_color = {'Winter': 'r', 'Summer': 'k', 'Spring': 'b'}
# Create figure and axes
f, ax = plt.subplots(figsize = (10, 4))
# Loop over and plot each group of data
for cond, data in df.groupby('Condition'):
ax.plot(data.Day, data.Utilisation, color = season_color[cond], label = cond)
# Fix datelabels
f.autofmt_xdate()
f.legend()
f.show()
If you truly want the date ticks to be rotated 90 degrees, use autofmt_xdate(rotation = 90)
Update:
If you want to plot everything in a single line it's a bit trickier since a line only can have one color associated to it.
You could plot a line between each point and split a line if it crosses a "color boundary", or check out this pyplot example: multicolored line
Another possibility is to plot a lot of scatter points between each point and create a own colormap that represents your color boundaries.
To create a colormap (and norm) I use from_levels_and_colors
import matplotlib.colors
colors = ['#00BEC5', '#a0c483', '#F9746A']
boundaries = [0, 10, 30, 100]
cm, nrm = matplotlib.colors.from_levels_and_colors(boundaries, colors)
To connect each point with next you could shift the dataframe, but here I just zip the original df with a sliced version
from itertools import islice
f, ax = plt.subplots(figsize = (10,4))
for (i,d0), (i,d1) in zip(df.iterrows(), islice(df.iterrows(), 1, None)):
d_range = pd.date_range(d0.Day, d1.Day, freq = 'h')
y_val = np.linspace(d0.Utilisation, d1.Utilisation, d_range.size)
ax.scatter(d_range, y_val, c = y_val, cmap = cm, norm = nrm)
f.autofmt_xdate()
f.show()

Matplotlib Venn diagram with legend

I'm using the matplotlib-venn packages for drawing venn diagrams in python. This packages works nicely for drawing Venn diagrams with two or three sets. However, when one of the sets is much larger than the others, the counts in the smaller circles can get close or overlap. Here's an example.
from collections import Counter
import matplotlib.pyplot as plt
from matplotlib_venn import venn2, venn3
sets = Counter()
sets['01'] = 3000
sets['11'] = 3
sets['10'] = 5
setLabels = ['set1', 'set2']
plt.figure()
ax = plt.gca()
v = venn2(subsets = sets, set_labels = setLabels, ax = ax)
plt.title('Venn Diagram')
plt.show()
What I'm looking to do is move the counts (in this case, 3000, 3, and 5) to a legend with colors matching those in the diagram. Wasn't sure how to do this with matplotlib_venn.
You may replace the labels for the venn diagram with empty strings and instead create a legend from the patches of the venn and the respective counts as follows:
from collections import Counter
import matplotlib.pyplot as plt
from matplotlib_venn import venn2, venn3
sets = Counter()
sets['01'] = 3000
sets['11'] = 3
sets['10'] = 5
setLabels = ['set1', 'set2']
plt.figure()
ax = plt.gca()
v = venn2(subsets = sets, set_labels = setLabels, ax = ax)
h, l = [],[]
for i in sets:
# remove label by setting them to empty string:
v.get_label_by_id(i).set_text("")
# append patch to handles list
h.append(v.get_patch_by_id(i))
# append count to labels list
l.append(sets[i])
#create legend from handles and labels
ax.legend(handles=h, labels=l, title="counts")
plt.title('Venn Diagram')
plt.show()