How can I set boxplot color by rainbow in matplotlib - matplotlib

I want to create boxplot of data in comparing, my plot looks like
how can I add color like

You can color the box following this example. Beyond that, you will need to map your data in mind to color on the "rainbow" colormap with this module. Here is an example with random test data. I map colors with means in this example.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
# Random test data
test_data = [np.random.normal(mean, 1, 100) for mean in range(50)]
fig, axes = plt.subplots(figsize=(12, 16))
# Horizontal box plot
bplot = axes.boxplot(test_data,
vert=False, # vertical box aligmnent
patch_artist=True) # fill with color
# Fill with colors
cmap = cm.ScalarMappable(cmap='rainbow')
test_mean = [np.mean(x) for x in test_data]
for patch, color in zip(bplot['boxes'], cmap.to_rgba(test_mean)):
patch.set_facecolor(color)
plt.show()

You can use the cmap property to actually be a function, accepting values between 0 and 1, and call it "normalising" your data. Using matplotlib example on boxplots:
import matplotlib.pyplot as plt
import numpy as np
# Random test data
np.random.seed(123)
all_data = [np.random.normal(0, 5, 100) for std in range(1, 21)]
fig, ax = plt.subplots(nrows=1, figsize=(9, 4))
# rectangular box plot
bplot = ax.boxplot(all_data, 0, '', 0, patch_artist=True)
cm = plt.cm.get_cmap('rainbow')
colors = [cm(val/len(all_data)) for val in range(len(all_data))]
for patch, color in zip(bplot['boxes'], colors):
patch.set_facecolor(color)
plt.show()

Related

Trying to place text in mpl just above the first yticklabel

I am having diffculties to move the text "Rank" exactly one line above the first label and by not using guesswork as I have different chart types with variable sizes, widths and also paddings between the labels and bars.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pylab import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1,30)))
df.plot.barh(width=0.8,ax=ax,legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
ax.text(-5,30,"Rank")
plt.show()
Using transData.transform didn't get me any further. The problem seems to be that ax.text() with the position params of (0,0) aligns with the start of the bars and not the yticklabels which I need, so getting the exact position of yticklabels relative to the axis would be helpful.
The following approach creates an offset_copy transform, using "axes coordinates". The top left corner of the main plot is at position 0, 1 in axes coordinates. The ticks have a "pad" (between label and tick mark) and a "padding" (length of the tick mark), both measured in "points".
The text can be right aligned, just as the ticks. With "bottom" as vertical alignment, it will be just above the main plot. If that distance is too low, you could try ax.text(0, 1.01, ...) to have it a bit higher.
import matplotlib.pyplot as plt
from matplotlib.transforms import offset_copy
import pandas as pd
import numpy as np
from matplotlib import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1, 30)))
df.plot.barh(width=0.8, ax=ax, legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
tick = ax.yaxis.get_major_ticks()[-1] # get information of one of the ticks
padding = tick.get_pad() + tick.get_tick_padding()
trans_offset = offset_copy(ax.transAxes, fig=fig, x=-padding, y=0, units='points')
ax.text(0, 1, "Rank", ha='right', va='bottom', transform=trans_offset)
# optionally also use tick.label.get_fontproperties()
plt.tight_layout()
plt.show()
I've answered my own question while Johan was had posted his one - which is pretty good and what I wanted. However, I post mine anyways as it uses an entirely different approach. Here I add a "ghost" row into the dataframe and label it appropriately which solves the problem:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pylab import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1,30)),columns=["val"])
#add a temporary header
new_row = pd.DataFrame({"val":0}, index=[0])
df = pd.concat([df[:],new_row]).reset_index(drop = True)
df.plot.barh(width=0.8,ax=ax,legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
# Set the top label to "Rank"
yticklabels = [t for t in ax.get_yticklabels()]
yticklabels[-1]="Rank"
# Left align all labels
[t.set_ha("left") for t in ax.get_yticklabels()]
ax.set_yticklabels(yticklabels)
# delete the top bar effectively by setting it's height to 0
ax.patches[-1].set_height(0)
plt.show()
Perhaps the advantage is that it is always a constant distance above the top label, but with the disadvantage that this is a bit "patchy" in the most literal sense to transform your dataframe for this task.

how to set the distance between bars and axis using matplot lib [duplicate]

So currently learning how to import data and work with it in matplotlib and I am having trouble even tho I have the exact code from the book.
This is what the plot looks like, but my question is how can I get it where there is no white space between the start and the end of the x-axis.
Here is the code:
import csv
from matplotlib import pyplot as plt
from datetime import datetime
# Get dates and high temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
#for index, column_header in enumerate(header_row):
#print(index, column_header)
dates, highs = [], []
for row in reader:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
# Plot data.
fig = plt.figure(dpi=128, figsize=(10,6))
plt.plot(dates, highs, c='red')
# Format plot.
plt.title("Daily high temperatures, July 2014", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()
There is an automatic margin set at the edges, which ensures the data to be nicely fitting within the axis spines. In this case such a margin is probably desired on the y axis. By default it is set to 0.05 in units of axis span.
To set the margin to 0 on the x axis, use
plt.margins(x=0)
or
ax.margins(x=0)
depending on the context. Also see the documentation.
In case you want to get rid of the margin in the whole script, you can use
plt.rcParams['axes.xmargin'] = 0
at the beginning of your script (same for y of course). If you want to get rid of the margin entirely and forever, you might want to change the according line in the matplotlib rc file:
axes.xmargin : 0
axes.ymargin : 0
Example
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
tips.plot(ax=ax1, title='Default Margin')
tips.plot(ax=ax2, title='Margins: x=0')
ax2.margins(x=0)
Alternatively, use plt.xlim(..) or ax.set_xlim(..) to manually set the limits of the axes such that there is no white space left.
If you only want to remove the margin on one side but not the other, e.g. remove the margin from the right but not from the left, you can use set_xlim() on a matplotlib axes object.
import seaborn as sns
import matplotlib.pyplot as plt
import math
max_x_value = 100
x_values = [i for i in range (1, max_x_value + 1)]
y_values = [math.log(i) for i in x_values]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
sn.lineplot(ax=ax1, x=x_values, y=y_values)
sn.lineplot(ax=ax2, x=x_values, y=y_values)
ax2.set_xlim(-5, max_x_value) # tune the -5 to your needs

Align multi-line ticks in Seaborn plot

I have the following heatmap:
I've broken up the category names by each capital letter and then capitalised them. This achieves a centering effect across the labels on my x-axis by default which I'd like to replicate across my y-axis.
yticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.index]
xticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.columns]
fig, ax = plt.subplots(figsize=(20,15))
sns.heatmap(corr, ax=ax, annot=True, fmt="d",
cmap="Blues", annot_kws=annot_kws,
mask=mask, vmin=0, vmax=5000,
cbar_kws={"shrink": .8}, square=True,
linewidths=5)
for p in ax.texts:
myTrans = p.get_transform()
offset = mpl.transforms.ScaledTranslation(-12, 5, mpl.transforms.IdentityTransform())
p.set_transform(myTrans + offset)
plt.yticks(plt.yticks()[0], labels=yticks, rotation=0, linespacing=0.4)
plt.xticks(plt.xticks()[0], labels=xticks, rotation=0, linespacing=0.4)
where corr represents a pre-defined pandas dataframe.
I couldn't seem to find an align parameter for setting the ticks and was wondering if and how this centering could be achieved in seaborn/matplotlib?
I've adapted the seaborn correlation plot example below.
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white")
# Generate a large random dataset
rs = np.random.RandomState(33)
d = pd.DataFrame(data=rs.normal(size=(100, 7)),
columns=['Donald\nDuck','Mickey\nMouse','Han\nSolo',
'Luke\nSkywalker','Yoda','Santa\nClause','Ronald\nMcDonald'])
# Compute the correlation matrix
corr = d.corr()
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
for i in ax.get_yticklabels():
i.set_ha('right')
i.set_rotation(0)
for i in ax.get_xticklabels():
i.set_ha('center')
Note the two for sequences above. These get the label and then set the horizontal alignment (You can also change the vertical alignment (set_va()).
The code above produces this:

how to customize color legend when using for loop in matplotlib, scatter

I want to draw a 3D scatter, in which the data is colored by group. Here is the data sample:
aa=pd.DataFrame({'a':[1,2,3,4,5],
'b':[2,3,4,5,6],
'c':[1,3,4,6,9],
'd':[0,0,1,2,3],
'e':['abc','sdf','ert','hgf','nhkm']})
Here, a, b, c are axis x, y, z. e is the text shown in the scatter. I need d to group the data and show different colors.
Here is my code:
fig = plt.figure()
ax = fig.gca(projection='3d')
zdirs = aa.loc[:,'e'].__array__()
xs = aa.loc[:,'a'].__array__()
ys = aa.loc[:,'b'].__array__()
zs = aa.loc[:,'c'].__array__()
colors = aa.loc[:,'d'].__array__()
colors1=np.where(colors==0,'grey',
np.where(colors==1,'yellow',
np.where(colors==2,'green',
np.where(colors==3,'pink','red'))))
for i in range(len(zdirs)): #plot each point + it's index as text above
ax.scatter(xs[i],ys[i],zs[i],color=colors1[i])
ax.text(xs[i],ys[i],zs[i], '%s' % (str(zdirs[i])), size=10, zorder=1, color='k')
ax.set_xlabel('a')
ax.set_ylabel('b')
ax.set_zlabel('c')
plt.show()
But I do not know how to put a legend on the plot. I hope my legend is like:
The colors and the numbers should match and be ordered.
Could anyone help me with how to customize the color bar?
First of all, I've taken the liberty to reduce your code a bit:
I'd suggest to create a ListedColormap to map integer->color, which allows you to pass the color column via c=aa['d'] (note it's c=, not color=!)
you don't need to use __array__() here, in the code below you can directly use aa['a']
finally, you can add an empty scatter plot for each color in the ListedColormap, and this can then be rendered correctly by ax.legend()
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from matplotlib.colors import ListedColormap
import matplotlib.patches as mpatches
aa=pd.DataFrame({'a':[1,2,3,4,5],
'b':[2,3,4,5,6],
'c':[1,3,4,6,9],
'd':[0,0,1,2,3],
'e':['abc','sdf','ert','hgf','nhkm']})
fig = plt.figure()
ax = fig.gca(projection='3d')
cmap = ListedColormap(['grey', 'yellow', 'green', 'pink','red'])
ax.scatter(aa['a'],aa['b'],aa['c'],c=aa['d'],cmap=cmap)
for x,y,z,label in zip(aa['a'],aa['b'],aa['c'],aa['e']):
ax.text(x,y,z,label,size=10,zorder=1)
# Create a legend through an *empty* scatter plot
[ax.scatter([], [], c=cmap(i), label=str(i)) for i in range(len(aa))]
ax.legend()
ax.set_xlabel('a')
ax.set_ylabel('b')
ax.set_zlabel('c')
plt.show()

Plotting masked numpy array leads to incorrect colorbar

I'm trying to create a custom color bar for a matplotlib PolyCollection. Everything seems ok until I attempt to plot a masked array. The color bar no longer shows the correct colors even though the plot does. Is there a different procedure for plotting masked arrays?
I'm using matplotlib 1.4.0 and numpy 1.8.
Here's my plotting code:
import numpy
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.collections import PolyCollection
vertices = numpy.load('vertices.npy')
array = numpy.load('array.npy')
# Take 2d slice out of 3D array
slice_ = array[:, :, 0:1].flatten(order='F')
fig, ax = plt.subplots()
poly = PolyCollection(vertices, array=slice_, edgecolors='black', linewidth=.25)
cm = mpl.colors.ListedColormap([(1.0, 0.0, 0.0), (.2, .5, .2)])
poly.set_cmap(cm)
bounds = [.1, .4, .6]
norm = mpl.colors.BoundaryNorm(bounds, cm.N)
fig.colorbar(poly, ax=ax, orientation='vertical', boundaries=bounds, norm=norm)
ax.add_collection(poly, autolim=True)
ax.autoscale_view()
plt.show()
Here's what the plot looks like:
However, when I plot a masked array with the following change before the slicing:
array = numpy.ma.array(array, mask=array > .5)
I get a color bar that now shows only a single color. Even though both colors are (correctly) still shown in the plot.
Is there some trick to keeping a colobar consistent when plotting a masked array? I know I can use cm.set_bad to change the color of masked values, but that's not quite what I'm looking for. I want the color bar to show up the same between these two plots since both colors and the color bar itself should remain unchanged.
Pass the BoundaryNorm to the PolyCollection, poly. Otherwise, poly.norm gets set to a matplotlib.colors.Normalize instance by default:
In [119]: poly.norm
Out[119]: <matplotlib.colors.Normalize at 0x7faac4dc8210>
I have not stepped through the source code sufficiently to explain exactly what is happening in the code you posted, but I speculate that the interaction of this Normalize instance and the BoundaryNorm make the range of values seen by the fig.colorbar different than what you expected.
In any case, if you pass norm=norm to PolyCollection, then the result looks correct:
import numpy
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.collections as mcoll
import matplotlib.colors as mcolors
numpy.random.seed(4)
N, M = 3, 3
vertices = numpy.random.random((N, M, 2))
array = numpy.random.random((1, N, 2))
# vertices = numpy.load('vertices.npy')
# array = numpy.load('array.npy')
array = numpy.ma.array(array, mask=array > .5)
# Take 2d slice out of 3D array
slice_ = array[:, :, 0:1].flatten(order='F')
fig, ax = plt.subplots()
bounds = [.1, .4, .6]
cm = mpl.colors.ListedColormap([(1.0, 0.0, 0.0), (.2, .5, .2)])
norm = mpl.colors.BoundaryNorm(bounds, cm.N)
poly = mcoll.PolyCollection(
vertices,
array=slice_,
edgecolors='black', linewidth=.25, norm=norm)
poly.set_cmap(cm)
fig.colorbar(poly, ax=ax, orientation='vertical')
ax.add_collection(poly, autolim=True)
ax.autoscale_view()
plt.show()