conditional matplotlib fill_between for dataframe - pandas

It seemed easy, but I struggle.
I want to color between axes conditionally, i.e. to have the area filled either green or red and fully filled. I use some boolean df column to determine colour
df = pd.DataFrame([1,1,1,1,1,0,0,1,1,0,1,1,0,1,0,1], columns=['grow'])
df['green']=(df.grow>0) | (df.grow.shift(1)>0)
df['red']= (df.grow<=0) | (df.grow.shift(1)<=0)
But when I use this condition to fill between with this:
fig, axes = plt.subplots(nrows=1, ncols=1)
axes.fill_between(df.index, 0, 1,
where=(df.grow>0) , color = 'green', alpha = 0.1)
axes.fill_between(df.index, 0, 1,
where=(df.grow<=0) , color = 'red', alpha = 0.1)
they are not fully filled. How shall I transform the limit in where to get good filling?

The reason why your filled areas are non-continuous becomes apparent when you plot pd.grow itself as a line plot. With the way you are using fill_between(), you are implicitly assuming that your data resembles a step function, but it is actually more of a saw tooth (the edges are not 'sharp'). One way to get around this is to fill the function with more, repeating values and thus make the transitions between 0s and 1s sharper. numpy is a practical tool for these kind of operations. Here an example of how to do it:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame([1,1,1,1,1,0,0,1,1,0,1,1,0,1,0,1], columns=['grow'])
df['green']=(df.grow>0) | (df.grow.shift(1)>0)
df['red']= (df.grow<=0) | (df.grow.shift(1)<=0)
fig, axes = plt.subplots(
nrows=2, ncols=2, gridspec_kw = {'height_ratios':[1, 3]}
)
axes[0,0].plot(df.index, df.grow)
axes[0,0].set_title('original function')
axes[1,0].fill_between(df.index, 0, 1,
where=(df.grow>0) , color = 'green', alpha = 0.1)
axes[1,0].fill_between(df.index, 0, 1,
where=(df.grow<=0) , color = 'red', alpha = 0.1)
axes[1,0].set_title('original shading')
N=100
x = np.linspace(df.index[0],df.index[-1],N*len(df.index))
y = np.repeat(df.grow, N)
axes[0,1].plot(x,y)
axes[0,1].set_title('sharper step function')
axes[1,1].fill_between(x, 0, 1,
where=(y>0) , color = 'green', alpha = 0.1, lw=0)
axes[1,1].fill_between(x, 0, 1,
where=(y<=0) , color = 'red', alpha = 0.1,lw=0)
axes[1,1].set_title('new_shading')
plt.show()
...and the result looks like this:
Hope this helps.

Related

Distinct color dots on Multidimensional Scaling plot (MDS) with plt.annotate()

Currently my MDS looks like this Instead of having the numberings on the MDS plot, I would like to replace them with dots. My desired output is that the point that is annotated as 'highest' will be a blue dot, the point that is annotated with 'lowest' will be a red dot, and all the other points will be grey dots.
Code to reproduce the MDS plot above.
import numpy as np
import scipy
import matplotlib.pyplot as plt
from sklearn.metrics import pairwise_distances #jaccard diss.
from sklearn import manifold # multidimensional scaling
foods_binary = np.random.randint(2, size=(100, 10)) #initial dataset
print(foods_binary.shape)
dis_matrix = pairwise_distances(foods_binary, metric = 'jaccard')
mds_model = manifold.MDS(n_components = 2, random_state = 123,
dissimilarity = 'precomputed')
mds_fit = mds_model.fit(dis_matrix)
mds_coords = mds_model.fit_transform(dis_matrix)
plt.figure()
plt.scatter(mds_coords[:,0],mds_coords[:,1],
facecolors = 'none', edgecolors = 'none') # points in white (invisible)
labels = [ 1, 2, 3, 4, 5, 'highest', 5, 6, 7, 8, 9, 'lowest']
for label, x, y in zip(labels, mds_coords[:,0], mds_coords[:,1]):
plt.annotate(label, (x,y), xycoords = 'data')
plt.xlabel('First Dimension')
plt.ylabel('Second Dimension')
plt.title('Dissimilarity among food items')
plt.show()
How can I achieve what I desire above? Thanks for any suggestions.

Pandas: plot a dataframe with on its right side rectangle colored according to an array's values

I have a dataframe with 100 rows and 4 columns. I have an array (size 100,1) filled with values spanning between 0 and 1. I would like to plot my dataframe, with on its right side a rectangle which will take a color depending on the value of the array at a specific row (see the poor drawing I made, the array is written to help understanding what I want). I would like the colors to be a gradient, where 0 = dark blue, and 1 = bright red.
I know how to create a colormap, but this is slightly different.
Which function do you advise me to use ?
Here is some code I use for the plotting:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
rectangle_values = np.random.rand(100)
plt.figure(figsize=(15,15))
ax = sns.heatmap(df, cbar = None)
)
My solution would be to use plot.subplots to create two plots with the width_ratios argument as something like 19:1. On the left hand side you plot the data frame as usual, on the right hand side you plot the vector. Notice that I am using vmin and vmax to set the boundaries as required (0, 1) for the vector. Also, for the requested colors, I'm using MatPlotLib's RdBu (Red and Blue map), but it was needed to reverse it in order to meet your requirements. You can confirm the colors by the values, on this run the generated random values were [0.74, 0.96, 0.87, 0.50, 0.26].
df = pd.DataFrame(np.random.randint(0,5,size=(5, 4)), columns=list('ABCD'))
rectangle_values = pd.DataFrame(np.random.rand(5), columns=['foo'])
plt.subplots(1, 2, gridspec_kw={'width_ratios': [19, 1]})
plt.subplot(1, 2, 1)
sns.heatmap(df, cbar = None)
plt.subplot(1, 2, 2)
sns.heatmap(rectangle_values, cbar = None, cmap=plt.cm.get_cmap('RdBu').reversed(), vmin=0, vmax=1)
plt.show()
And the output is:

define size of individual subplots side by side

I am using subplots side by side
plt.subplot(1, 2, 1)
# plot 1
plt.xlabel('MEM SET')
plt.ylabel('Memory Used')
plt.bar(inst_memory['MEMORY_SET_TYPE'], inst_memory['USED_MB'], alpha = 0.5, color = 'r')
# pol 2
plt.subplot(1, 2, 2)
plt.xlabel('MEM POOL')
plt.ylabel('Memory Used')
plt.bar(set_memory['POOL_TYPE'], set_memory['MEMORY_POOL_USED'], alpha = 0.5, color = 'g')
they have identical size - but is it possible to define the width for each subplot, so the right one could be wider as it has more entries and text would not squeeze or would it be possible to replace the bottom x-text by a number and have a legend with 1:means xx 2:means yyy
I find GridSpec helpful for subplot arrangements, see this demo at matplotlib.
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import pandas as pd
N=24
inst_memory = pd.DataFrame({'MEMORY_SET_TYPE': np.random.randint(0,3,N),
'USED_MB': np.random.randint(0,1000,N)})
set_memory = pd.DataFrame({'MEMORY_POOL_USED': np.random.randint(0,1000,N),
'POOL_TYPE': np.random.randint(0,10,N)})
fig = plt.figure()
gs = GridSpec(1, 2, width_ratios=[1, 2],wspace=0.3)
ax1 = fig.add_subplot(gs[0])
ax2 = fig.add_subplot(gs[1])
ax1.bar(inst_memory['MEMORY_SET_TYPE'], inst_memory['USED_MB'], alpha = 0.5, color = 'r')
ax2.bar(set_memory['POOL_TYPE'], set_memory['MEMORY_POOL_USED'], alpha = 0.5, color = 'g')
You may need to adjust width_ratios and wspace to get the desired layout.
Also, rotating the text in x-axis might help, some info here.

matplotlib line with different color depending on other variable

I want to plot a time-series using matplotlib and plot. However, I want the line color to change depending on another discrete time-series.
income = [5000, 5005, 5010, 6000, 6060, 6120, 7000]
job = [0, 0, 0, 1, 1, 1, 2]
I tried something like:
plt.plot(income, c=job, cmap='RdBu')
but that leads to 'Line2D' object has no property 'cmap'. I also tried:
plt.scatter(range(0, len(income)), income, c=job, cmap='RdBu')
does not give the lines which is also not ideal. Is there any way to make a figure like the one below [created in Matlab] in Matplotlib?
I think colormap is useful for continuous data. For discrete it is better to use discrete color list. Thus, you can pair color to type variable:
Code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
income = np.array([5000, 5005, 5010, 6000, 6060, 6120, 7000])
x = np.arange(len(y))
job = np.array([0, 0, 0, 1, 1, 1, 2]).astype('int')
# iterate over zipped job and color
for j, c in zip(job, colors.TABLEAU_COLORS):
plt.plot(x[job == j], income[job == j], 'o-', c=c)
plt.show()
Plot:
I used TABLEAU_COLORS but you can find another color list here if you wish.

Pandas Colormap - Changing line color [duplicate]

If you have a Colormap cmap, for example:
cmap = matplotlib.cm.get_cmap('Spectral')
How can you get a particular colour out of it between 0 and 1, where 0 is the first colour in the map and 1 is the last colour in the map?
Ideally, I would be able to get the middle colour in the map by doing:
>>> do_some_magic(cmap, 0.5) # Return an RGBA tuple
(0.1, 0.2, 0.3, 1.0)
You can do this with the code below, and the code in your question was actually very close to what you needed, all you have to do is call the cmap object you have.
import matplotlib
cmap = matplotlib.cm.get_cmap('Spectral')
rgba = cmap(0.5)
print(rgba) # (0.99807766255210428, 0.99923106502084169, 0.74602077638401709, 1.0)
For values outside of the range [0.0, 1.0] it will return the under and over colour (respectively). This, by default, is the minimum and maximum colour within the range (so 0.0 and 1.0). This default can be changed with cmap.set_under() and cmap.set_over().
For "special" numbers such as np.nan and np.inf the default is to use the 0.0 value, this can be changed using cmap.set_bad() similarly to under and over as above.
Finally it may be necessary for you to normalize your data such that it conforms to the range [0.0, 1.0]. This can be done using matplotlib.colors.Normalize simply as shown in the small example below where the arguments vmin and vmax describe what numbers should be mapped to 0.0 and 1.0 respectively.
import matplotlib
norm = matplotlib.colors.Normalize(vmin=10.0, vmax=20.0)
print(norm(15.0)) # 0.5
A logarithmic normaliser (matplotlib.colors.LogNorm) is also available for data ranges with a large range of values.
(Thanks to both Joe Kington and tcaswell for suggestions on how to improve the answer.)
In order to get rgba integer value instead of float value, we can do
rgba = cmap(0.5,bytes=True)
So to simplify the code based on answer from Ffisegydd, the code would be like this:
#import colormap
from matplotlib import cm
#normalize item number values to colormap
norm = matplotlib.colors.Normalize(vmin=0, vmax=1000)
#colormap possible values = viridis, jet, spectral
rgba_color = cm.jet(norm(400),bytes=True)
#400 is one of value between 0 and 1000
I once ran into a similar situation where I needed "n" no. of colors from a colormap so that I can assign each color to my data.
I have compiled a code to this in a package called "mycolorpy".
You can pip install it using:
pip install mycolorpy
You can then do:
from mycolorpy import colorlist as mcp
import numpy as np
Example: To create a list of 5 hex strings from cmap "winter"
color1=mcp.gen_color(cmap="winter",n=5)
print(color1)
Output:
['#0000ff', '#0040df', '#0080bf', '#00c09f', '#00ff80']
Another example to generate 16 list of colors from cmap bwr:
color2=mcp.gen_color(cmap="bwr",n=16)
print(color2)
Output:
['#0000ff', '#2222ff', '#4444ff', '#6666ff', '#8888ff', '#aaaaff', '#ccccff', '#eeeeff', '#ffeeee', '#ffcccc', '#ffaaaa', '#ff8888', '#ff6666', '#ff4444', '#ff2222', '#ff0000']
There is a python notebook with usage examples to better visualize this.
Say you want to generate a list of colors from a cmap that is normalized to a given data. You can do that using:
a=random.randint(1000, size=(200))
a=np.array(a)
color1=mcp.gen_color_normalized(cmap="seismic",data_arr=a)
plt.scatter(a,a,c=color1)
Output:
You can also reverse the color using:
color1=mcp.gen_color_normalized(cmap="seismic",data_arr=a,reverse=True)
plt.scatter(a,a,c=color1)
Output:
I had precisely this problem, but I needed sequential plots to have highly contrasting color. I was also doing plots with a common sub-plot containing reference data, so I wanted the color sequence to be consistently repeatable.
I initially tried simply generating colors randomly, reseeding the RNG before each plot. This worked OK (commented-out in code below), but could generate nearly indistinguishable colors. I wanted highly contrasting colors, ideally sampled from a colormap containing all colors.
I could have as many as 31 data series in a single plot, so I chopped the colormap into that many steps. Then I walked the steps in an order that ensured I wouldn't return to the neighborhood of a given color very soon.
My data is in a highly irregular time series, so I wanted to see the points and the lines, with the point having the 'opposite' color of the line.
Given all the above, it was easiest to generate a dictionary with the relevant parameters for plotting the individual series, then expand it as part of the call.
Here's my code. Perhaps not pretty, but functional.
from matplotlib import cm
cmap = cm.get_cmap('gist_rainbow') #('hsv') #('nipy_spectral')
max_colors = 31 # Constant, max mumber of series in any plot. Ideally prime.
color_number = 0 # Variable, incremented for each series.
def restart_colors():
global color_number
color_number = 0
#np.random.seed(1)
def next_color():
global color_number
color_number += 1
#color = tuple(np.random.uniform(0.0, 0.5, 3))
color = cmap( ((5 * color_number) % max_colors) / max_colors )
return color
def plot_args(): # Invoked for each plot in a series as: '**(plot_args())'
mkr = next_color()
clr = (1 - mkr[0], 1 - mkr[1], 1 - mkr[2], mkr[3]) # Give line inverse of marker color
return {
"marker": "o",
"color": clr,
"mfc": mkr,
"mec": mkr,
"markersize": 0.5,
"linewidth": 1,
}
My context is JupyterLab and Pandas, so here's sample plot code:
restart_colors() # Repeatable color sequence for every plot
fig, axs = plt.subplots(figsize=(15, 8))
plt.title("%s + T-meter"%name)
# Plot reference temperatures:
axs.set_ylabel("°C", rotation=0)
for s in ["T1", "T2", "T3", "T4"]:
df_tmeter.plot(ax=axs, x="Timestamp", y=s, label="T-meter:%s" % s, **(plot_args()))
# Other series gets their own axis labels
ax2 = axs.twinx()
ax2.set_ylabel(units)
for c in df_uptime_sensors:
df_uptime[df_uptime["UUID"] == c].plot(
ax=ax2, x="Timestamp", y=units, label="%s - %s" % (units, c), **(plot_args())
)
fig.tight_layout()
plt.show()
The resulting plot may not be the best example, but it becomes more relevant when interactively zoomed in.
To build on the solutions from Ffisegydd and amaliammr, here's an example where we make CSV representation for a custom colormap:
#! /usr/bin/env python3
import matplotlib
import numpy as np
vmin = 0.1
vmax = 1000
norm = matplotlib.colors.Normalize(np.log10(vmin), np.log10(vmax))
lognum = norm(np.log10([.5, 2., 10, 40, 150,1000]))
cdict = {
'red':
(
(0., 0, 0),
(lognum[0], 0, 0),
(lognum[1], 0, 0),
(lognum[2], 1, 1),
(lognum[3], 0.8, 0.8),
(lognum[4], .7, .7),
(lognum[5], .7, .7)
),
'green':
(
(0., .6, .6),
(lognum[0], 0.8, 0.8),
(lognum[1], 1, 1),
(lognum[2], 1, 1),
(lognum[3], 0, 0),
(lognum[4], 0, 0),
(lognum[5], 0, 0)
),
'blue':
(
(0., 0, 0),
(lognum[0], 0, 0),
(lognum[1], 0, 0),
(lognum[2], 0, 0),
(lognum[3], 0, 0),
(lognum[4], 0, 0),
(lognum[5], 1, 1)
)
}
mycmap = matplotlib.colors.LinearSegmentedColormap('my_colormap', cdict, 256)
norm = matplotlib.colors.LogNorm(vmin, vmax)
colors = {}
count = 0
step_size = 0.001
for value in np.arange(vmin, vmax+step_size, step_size):
count += 1
print("%d/%d %f%%" % (count, vmax*(1./step_size), 100.*count/(vmax*(1./step_size))))
rgba = mycmap(norm(value), bytes=True)
color = (rgba[0], rgba[1], rgba[2])
if color not in colors.values():
colors[value] = color
print ("value, red, green, blue")
for value in sorted(colors.keys()):
rgb = colors[value]
print("%s, %s, %s, %s" % (value, rgb[0], rgb[1], rgb[2]))
Colormaps come with their own normalize method, so if you have a plot already made you can access the color at a certain value.
import matplotlib.pyplot as plt
import numpy as np
cmap = plt.cm.viridis
cm = plt.pcolormesh(np.random.randn(10, 10), cmap=cmap)
print(cmap(cm.norm(2.2)))
For a quick and dirty you can use the map directly.
Or you can just do what #amaliammr says.
data_size = 23 # range 0..23
colors = plt.cm.turbo
color_normal = colours.N/data_size
for i in range(data_size):
col = colours.colors[int(i*color_normal)]