Conditional Colours for Holoviews Heatmap - pandas

Just wondering how I can create a custom colour scheme based on conditions for a holoviews heatmap. I have created a column for colours that are based on conditions within the data. However, when I plot these the standard cmap appears but my colour scheme appears on the cells when I hover over them. Does anyone know how I can ignore the standard color map that is displaying or implement it so my conditional one appears instead. Example code below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from datetime import datetime
import holoviews as hv
from holoviews import opts
import panel as pn
from bokeh.resources import INLINE
from holoviews import dim
hv.extension('bokeh', 'matplotlib')
gv.extension('bokeh')
pd.options.plotting.backend = 'holoviews'
green = '#00FF00'
amber = '#FFFF00'
red = '#FF0000'
Data = [['A', 'Foo', 0.2] , ['B', 'Bar', 0.9], ['C', 'Cat', 0.7]]
df = pd.DataFrame(Data, columns = ['Name', 'Category', 'Value'])
df['colors'] = df.apply(lambda row: green if row['Value'] >= 0.9 else
amber if row['Value'] < 0.9 and row['Value'] >= 0.7 else
red if row['Value'] < 0.7 else '#8A2BE2', axis = 1)
df_hm = hv.HeatMap(df,kdims=['Category','Name'], vdims=['Value', 'colors']).opts(width=900, height=400, color = hv.dim('colors'), tools=['hover'])
When this code is ran I get the following, which is the standard cmap:
enter image description here
However, when I hover over the cell the color changes to scheme I want, unfortunatly I can't add a picture to show it. But does anyone know how I can make it only show the conditional colouring that I am after.
I've added a picutre of what is happening. When I hover over the cell you can see the conditonal coloring, however there is cmap color overlayed on to this, which I want to remove.
Current behavior
Thanks a bunch for any help!

You are using the wrong keyword in your ops() call. You have to use cmap instead of color.
Here is a very basic example, adapted from here.
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
factors = ["a", "b", "c", "d", "e", "f", "g", "h"]
x = [50, 40, 65, 10, 25, 37, 80, 60]
scatter = hv.Scatter((factors, x))
spikes = hv.Spikes(scatter)
x = ["foo", "foo", "foo", "bar", "bar", "bar", "baz", "baz", "baz"]
y = ["foo", "bar", "baz", "foo", "bar", "baz", "foo", "bar", "baz"]
z = [0, 1, 2, 3, 4, 5, 6, 7, 8]
colors = ['#00FF00','#FFFF00','#FF0000','#FFFF00','#FF0000', '#00FF00','#FF0000', '#00FF00','#FFFF00']
hv.HeatMap((x,y,z)).opts(width=450, height=400, cmap=colors, tools=['hover'])
Output

Related

Styling the background color of pandas index cell

I want to change the background styling color of the pandas table.
Example:
df = pd.DataFrame({
'group': ['A','B','C','D'],
'var1': [38, 1.5, 30, 4],
'var2': [29, 10, 9, 34],
'var3': [8, 39, 23, 24],
'var4': [7, 31, 33, 14],
'var5': [28, 15, 32, 14]
})
df.set_index('group', inplace=True)
I want the background color of the index cell (and just the index cell) A,C in blue and B,D in red.
I looked at the styling documentation but I could not find an example that matches this case.
Based on the limitations of df.style, coloring index and columns is not implemented.
If you are okay to color the values based on the index, you can do something like this. Create a dictionary and apply the color based on the index using index.map
c1 = 'background-color: blue'
c2 = 'background-color: red'
d = {"A":c1,"B":c2,"C":c1,"D":c2}
df.style.apply(lambda x: x.index.map(d))
We can manipulate the underlying HTML via looking at <th> tags. One convenient way is with the BeautifulSoup library:
from bs4 import BeautifulSoup
# get the HTML representation
html = df.to_html()
# form the soup
soup = BeautifulSoup(html)
# the color categories
blues = {"A", "C"}
reds = {"B", "D"}
# for each possible "table header" tag...
for tag in soup.find_all("th"):
# if tag's content is in `blues`...
if tag.text in blues:
# change the HTML style attribute accordingly
tag["style"] = "background-color: blue;"
# similar here..
elif tag.text in reds:
tag["style"] = "background-color: red;"
# get back the new HTML
new_html = str(soup)
Then, in an IPython notebook for example:
from IPython.display import HTML
HTML(new_html)
gives
This feature is not currently available and the next release for this might be December 2021 (https://github.com/pandas-dev/pandas/pull/41893). However, you can easily work around this by using table styles. See my answer below:
df = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]], index=["A", "B", "C", "D"])
green = [{'selector': 'th', 'props': 'background-color: green'}]
red = [{'selector': 'th', 'props': 'background-color: red'}]
df.style.set_table_styles({"A": green, "B": red, "C": green, "D": red}, axis=1)

bi-directional bar chart with annotation in python plotly

I have a pandas dataset with a toy version that can be created with this
#creating a toy pandas dataframe
s1 = pd.Series(['dont have a mortgage',-31.8,'have mortgage',15.65])
s2 = pd.Series(['have utility bill arrears',-21.45,'',0])
s3 = pd.Series(['have interest only mortgage',-19.59,'',0])
s4 = pd.Series(['bank with challenger bank',-19.24,'bank with a traditional bank',32.71])
df = pd.DataFrame([list(s1),list(s2),list(s3),list(s4)], columns = ['label1','value1','label2','value2'])
I want to create a bar chart that looks like this version I hacked together in excel
I want to be able to supply RGB values to customise the two colours for the left and right bars (currently blue and orange)
I tried different versions using “fig.add_trace(go.Bar” but am brand new to plotly and cant get anything to work with different coloured bars on one row with annotation under each bar.
All help greatly appreciated!
thanks
To create a double-sided bar chart, you can create two subplots with shared x- and y-axis. Each subplot is a horizontal bar chart with a specified marker color
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# define data set
s1 = pd.Series(['dont have a mortgage',-31.8,'have mortgage',15.65])
s2 = pd.Series(['have utility bill arrears',-21.45,'',0])
s3 = pd.Series(['have interest only mortgage',-19.59,'',0])
s4 = pd.Series(['bank with challenger bank',-19.24,'bank with a traditional bank',32.71])
df = pd.DataFrame([list(s1),list(s2),list(s3),list(s4)], columns = ['label1','value1','label2','value2'])
# create subplots
fig = make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_xaxes=True,
shared_yaxes=True, horizontal_spacing=0)
fig.append_trace(go.Bar(y=df.index, x=df.value1, orientation='h', width=0.4, showlegend=False, marker_color='#4472c4'), 1, 1)
fig.append_trace(go.Bar(y=df.index, x=df.value2, orientation='h', width=0.4, showlegend=False, marker_color='#ed7d31'), 1, 2)
fig.update_yaxes(showticklabels=False) # hide all yticks
The annotations need to be added separately:
annotations = []
for i, row in df.iterrows():
if row.label1 != '':
annotations.append({
'xref': 'x1',
'yref': 'y1',
'y': i,
'x': row.value1,
'text': row.value1,
'xanchor': 'right',
'showarrow': False})
annotations.append({
'xref': 'x1',
'yref': 'y1',
'y': i-0.3,
'x': -1,
'text': row.label1,
'xanchor': 'right',
'showarrow': False})
if row.label2 != '':
annotations.append({
'xref': 'x2',
'yref': 'y2',
'y': i,
'x': row.value2,
'text': row.value2,
'xanchor': 'left',
'showarrow': False})
annotations.append({
'xref': 'x2',
'yref': 'y2',
'y': i-0.3,
'x': 1,
'text': row.label2,
'xanchor': 'left',
'showarrow': False})
fig.update_layout(annotations=annotations)
fig.show()

Draw semicircle chart using matplotlib

Is matplotlib capable of creating semicircle charts like this:
I have tried matplotlib.pyplot.pie without success.
It doesn't seem like there is a built-in half-circle type in matplotlib. However, a workaround can be made based on matplotlib.pyplot.pie:
Append the total sum of the data and assign white color to it.
Overlay a white circle in the center by an Artist object (reference).
Sample Code:
import matplotlib.pyplot as plt
# data
label = ["A", "B", "C"]
val = [1,2,3]
# append data and assign color
label.append("")
val.append(sum(val)) # 50% blank
colors = ['red', 'blue', 'green', 'white']
# plot
fig = plt.figure(figsize=(8,6),dpi=100)
ax = fig.add_subplot(1,1,1)
ax.pie(val, labels=label, colors=colors)
ax.add_artist(plt.Circle((0, 0), 0.6, color='white'))
fig.show()
Output:
My solution:
import matplotlib.pyplot as plt
# data
label = ["A", "B", "C"]
val = [1,2,3]
# append data and assign color
label.append("")
val.append(sum(val)) # 50% blank
colors = ['red', 'blue', 'green', 'k']
# plot
plt.figure(figsize=(8,6),dpi=100)
wedges, labels=plt.pie(val, wedgeprops=dict(width=0.4,edgecolor='w'),labels=label, colors=colors)
# I tried this method
wedges[-1].set_visible(False)
plt.show()
Output:
enter image description here

Formatting Date labels using Seaborn FacetGrid

I want to make a facet grid with variable names as the columns, and departments as the rows, and each small chart is a scatter chart of y=value and x=date
My data is sort of like this:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime
import matplotlib.dates as mdates
import random
datelist = pd.date_range(start="march 1 2020", end="may 20 2020", freq="w").tolist()
varlist = ["x", "y", "z", "x", "y", "z", "x", "y", "z", "x", "y", "z"]
deptlist = ["a", "a", "b", "a", "a", "b", "a", "a", "b", "a", "a", "b"]
vallist = random.sample(range(10, 30), 12)
df = pd.DataFrame({'date': datelist, 'value': vallist, 'variable': varlist, 'department': deptlist})
I want to make a facet grid with variable names as the columns, and departments as the rows, and each small chart is a scatter chart of y=value and x=date
Here is what I have so far. It almost works, except I want to see dates along the bottom that are not squished together, so I would like to see "3/1 4/1 5/1" instead of full dates. But I can't figure out how to format it.
plt.style.use('seaborn-darkgrid')
xformatter = mdates.DateFormatter("%m-%d")
g = sns.FacetGrid(df2, row="department", col="variable", sharey='row')
g = g.map(plt.plot, "date", "value", marker='o', markersize=0.7)
datelist = pd.date_range(start="march 1 2020", end="june 1 2020", freq="MS").tolist()
g.set(xticks=datelist)
This is pretty close, but notice the dates along the bottom x axes. They are all scrunched together. That's why I tried to use a special date formatter but couldn't get it to work. Really what I would like is that each date shows up as mon-dd and that I can control how many tick marks appear there.
You can access the Axes object of the FacetGrid as g.axes (a 2D array). You could iterate over this array and change the properties of all the axes, but in your case you have sharex=True (the default), that means that changing the xaxis of one of the subplots will change all of the subplots at the same time.
g = sns.FacetGrid(df, row="department", col="variable", sharey='row')
g = g.map(plt.plot, "date", "value", marker='o', markersize=0.7)
xformatter = mdates.DateFormatter("%m/%d")
g.axes[0,0].xaxis.set_major_formatter(xformatter)

Plotting lists with different number of elements in matplotlib

I have a list of numpy arrays, each potentially having a different number of elements, such as:
[array([55]),
array([54]),
array([], dtype=float64),
array([48, 55]),]
I would like to plot this, where each array has an abscissa (x value) assigned, such as [1,2,3,4] so that the plot should show the following points: [[1,55], [2, 54], [4, 48], [4, 55]].
Is there a way I can do that with matplotlib? or how can I transform the data with numpy or pandas first so that it is can be plotted?
What you want to do is chain the original array and generate a new array with "abscissas". There are many way to concatenated, one of the most efficient is using itertools.chain.
import itertools
from numpy import array
x = [array([55]), array([54]), array([]), array([48, 55])]
ys = list(itertools.chain(*x))
# this will be [55, 54, 48, 55]
# generate abscissas
xs = list(itertools.chain(*[[i+1]*len(x1) for i, x1 in enumerate(x)]))
Now you can just plot easily with matplotlib as below
import matplotlib.pyplot as plt
plt.plot(xs, ys)
If you want to have different markers for different groups of data (the colours are automatically cycled by matplotlib):
import numpy as np
import matplotlib.pyplot as plt
markers = ['o', #'circle',
'v', #'triangle_down',
'^', #'triangle_up',
'<', #'triangle_left',
'>', #'triangle_right',
'1', #'tri_down',
'2', #'tri_up',
'3', #'tri_left',
'4', #'tri_right',
'8', #'octagon',
's', #'square',
'p', #'pentagon',
'h', #'hexagon1',
'H', #'hexagon2',
'D', #'diamond',
'd', #'thin_diamond'
]
n_markers = len(markers)
a = [10.*np.random.random(int(np.random.random()*10)) for i in xrange(n_markers)]
fig = plt.figure()
ax = fig.add_subplot(111)
for i, data in enumerate(a):
xs = data.shape[0]*[i,] # makes the abscissas list
marker = markers[i % n_markers] # picks a valid marker
ax.plot(xs, data, marker, label='data %d, %s'%(i, marker))
ax.set_xlim(-1, 1.4*len(a))
ax.set_ylim(0, 10)
ax.legend(loc=None)
fig.tight_layout()
Notice the limits to y scale are hard coded, change accordingly. The 1.4*len(a) is meant to leave room on the right side of the graph for the legend.
The example above has no point in the x=0 (would be dark blue circles) as the randomly assigned size for its data set was zero, but you can easily place a +1 if you don't want to use x=0.
Using pandas to create a numpy array with nans inserted when an array is empty or shorter than the longest array in the list...
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
arr_list = [np.array([55]),
np.array([54]),
np.array([], dtype='float64'),
np.array([48, 55]),]
df = pd.DataFrame(arr_list)
list_len = len(df)
repeats = len(list(df))
vals = df.values.flatten()
xax = np.repeat(np.arange(list_len) + 1, repeats)
df_plot = pd.DataFrame({'xax': xax, 'vals': vals})
plt.scatter(df_plot.xax, df_plot.vals);
with x your list :
[plt.plot(np.repeat(i,len(x[i])), x[i],'.') for i in range(len(x))]
plt.show()
#Alessandro Mariani's answer based on itertools made me think of another way to generate an array containg the data I needed. In some cases it may be more compact. It is also based on itertools.chain:
import itertools
from numpy import array
y = [array([55]), array([54]), array([]), array([48, 55])]
x = array([1,2,3,4])
d = array(list(itertools.chain(*[itertools.product([t], n) for t, n in zip(x,y)])))
d is now the following array:
array([[ 1, 55],
[ 2, 54],
[ 4, 48],
[ 4, 55]])