bi-directional bar chart with annotation in python plotly - plotly-python

I have a pandas dataset with a toy version that can be created with this
#creating a toy pandas dataframe
s1 = pd.Series(['dont have a mortgage',-31.8,'have mortgage',15.65])
s2 = pd.Series(['have utility bill arrears',-21.45,'',0])
s3 = pd.Series(['have interest only mortgage',-19.59,'',0])
s4 = pd.Series(['bank with challenger bank',-19.24,'bank with a traditional bank',32.71])
df = pd.DataFrame([list(s1),list(s2),list(s3),list(s4)], columns = ['label1','value1','label2','value2'])
I want to create a bar chart that looks like this version I hacked together in excel
I want to be able to supply RGB values to customise the two colours for the left and right bars (currently blue and orange)
I tried different versions using “fig.add_trace(go.Bar” but am brand new to plotly and cant get anything to work with different coloured bars on one row with annotation under each bar.
All help greatly appreciated!
thanks

To create a double-sided bar chart, you can create two subplots with shared x- and y-axis. Each subplot is a horizontal bar chart with a specified marker color
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# define data set
s1 = pd.Series(['dont have a mortgage',-31.8,'have mortgage',15.65])
s2 = pd.Series(['have utility bill arrears',-21.45,'',0])
s3 = pd.Series(['have interest only mortgage',-19.59,'',0])
s4 = pd.Series(['bank with challenger bank',-19.24,'bank with a traditional bank',32.71])
df = pd.DataFrame([list(s1),list(s2),list(s3),list(s4)], columns = ['label1','value1','label2','value2'])
# create subplots
fig = make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_xaxes=True,
shared_yaxes=True, horizontal_spacing=0)
fig.append_trace(go.Bar(y=df.index, x=df.value1, orientation='h', width=0.4, showlegend=False, marker_color='#4472c4'), 1, 1)
fig.append_trace(go.Bar(y=df.index, x=df.value2, orientation='h', width=0.4, showlegend=False, marker_color='#ed7d31'), 1, 2)
fig.update_yaxes(showticklabels=False) # hide all yticks
The annotations need to be added separately:
annotations = []
for i, row in df.iterrows():
if row.label1 != '':
annotations.append({
'xref': 'x1',
'yref': 'y1',
'y': i,
'x': row.value1,
'text': row.value1,
'xanchor': 'right',
'showarrow': False})
annotations.append({
'xref': 'x1',
'yref': 'y1',
'y': i-0.3,
'x': -1,
'text': row.label1,
'xanchor': 'right',
'showarrow': False})
if row.label2 != '':
annotations.append({
'xref': 'x2',
'yref': 'y2',
'y': i,
'x': row.value2,
'text': row.value2,
'xanchor': 'left',
'showarrow': False})
annotations.append({
'xref': 'x2',
'yref': 'y2',
'y': i-0.3,
'x': 1,
'text': row.label2,
'xanchor': 'left',
'showarrow': False})
fig.update_layout(annotations=annotations)
fig.show()

Related

How can I use matplotlib.pyplot to customize geopandas plots?

What is the difference between geopandas plots and matplotlib plots? Why are not all keywords available?
In geopandas there is markersize, but not markeredgecolor...
In the example below I plot a pandas df with some styling, then transform the pandas df to a geopandas df. Simple plotting is working, but no additional styling.
This is just an example. In my geopandas plots I would like to customize, markers, legends, etc. How can I access the relevant matplotlib objects?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
X = np.linspace(-6, 6, 1024)
Y = np.sinc(X)
df = pd.DataFrame(Y, X)
plt.plot(X,Y,linewidth = 3., color = 'k', markersize = 9, markeredgewidth = 1.5, markerfacecolor = '.75', markeredgecolor = 'k', marker = 'o', markevery = 32)
# alternatively:
# df.plot(linewidth = 3., color = 'k', markersize = 9, markeredgewidth = 1.5, markerfacecolor = '.75', markeredgecolor = 'k', marker = 'o', markevery = 32)
plt.show()
# create GeoDataFrame from df
df.reset_index(inplace=True)
df.rename(columns={'index': 'Y', 0: 'X'}, inplace=True)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Y'], df['X']))
gdf.plot(linewidth = 3., color = 'k', markersize = 9) # working
gdf.plot(linewidth = 3., color = 'k', markersize = 9, markeredgecolor = 'k') # not working
plt.show()
You're probably confused by the fact that both libraries named the method .plot(. In matplotlib that specifically translates to a mpl.lines.Line2D object, which also contains the markers and their styling.
Geopandas, assumes you want to plot geographic data, and uses a Path for this (mpl.collections.PathCollection). That has for example the face and edgecolors, but no markers. The facecolor comes into play whenever your path closes and forms a polygon (your example doesn't, making it "just" a line).
Geopandas seems to use a bit of a trick for points/markers, it appears to draw a "path" using the "CURVE4" code (cubic Bézier).
You can explore what's happening if you capture the axes that geopandas returns:
ax = gdf.plot(...
Using ax.get_children() you'll get all artists that have been added to the axes, since this is a simple plot, it's easy to see that the PathCollection is the actual data. The other artists are drawing the axis/spines etc.
[<matplotlib.collections.PathCollection at 0x1c05d5879d0>,
<matplotlib.spines.Spine at 0x1c05d43c5b0>,
<matplotlib.spines.Spine at 0x1c05d43c4f0>,
<matplotlib.spines.Spine at 0x1c05d43c9d0>,
<matplotlib.spines.Spine at 0x1c05d43f1c0>,
<matplotlib.axis.XAxis at 0x1c05d036590>,
<matplotlib.axis.YAxis at 0x1c05d43ea10>,
Text(0.5, 1.0, ''),
Text(0.0, 1.0, ''),
Text(1.0, 1.0, ''),
<matplotlib.patches.Rectangle at 0x1c05d351b10>]
If you reduce the amount of points a lot, like use 5 instead of 1024, retrieving the Path's drawn show the coordinates and also the codes used:
pcoll = ax.get_children()[0] # the first artist is the PathCollection
path = pcoll.get_paths()[0] # it only contains 1 Path
print(path.codes) # show the codes used.
# array([ 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
# 4, 4, 4, 4, 4, 4, 4, 4, 79], dtype=uint8)
Some more info about how these paths work can be found at:
https://matplotlib.org/stable/tutorials/advanced/path_tutorial.html
So long story short, you do have all the same keywords as when using Matplotlib, but they're the keywords for Path's and not the Line2D object that you might expect.
You can always flip the order around, and start with a Matplotlib figure/axes created by you, and pass that axes to Geopandas when you want to plot something. That might make it easier or more intuitive when you (also) want to plot other things in the same axes. It does require perhaps a bit more discipline to make sure the (spatial)coordinates etc match.
I personally almost always do that, because it allows to do most of the plotting using the same Matplotlib API's. Which admittedly has perhaps a slightly steeper learning curve. But overall I find it easier compared to having to deal with every package's slightly different interpretation that uses Matplotlib under the hood (eg geopandas, seaborn, xarray etc). But that really depends on where you're coming from.
Thank you for your detailed answer. Based on this I came up with this simplified code from my real project.
I have a shapefile shp and some point data df which I want to plot. shp is plotted with geopandas, df with matplotlib.plt. No need for transferring the point data into a geodataframe gdf as I did initially.
# read marker data (places with coordindates)
df = pd.read_csv("../obese_pct_by_place.csv")
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['sweref99_lng'], df['sweref99_lat']))
# read shapefile
shp = gpd.read_file("../../SWEREF_Shapefiles/KommunSweref99TM/Kommun_Sweref99TM_region.shp")
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_aspect('equal')
shp.plot(ax=ax)
# plot obesity markers
# geopandas, no edgecolor here
# gdf.plot(ax=ax, marker='o', c='r', markersize=gdf['obese'] * 25)
# matplotlib.pyplot with edgecolor
plt.scatter(df['sweref99_lng'], df['sweref99_lat'], c='r', edgecolor='k', s=df['obese'] * 25)
plt.show()

Conditional Colours for Holoviews Heatmap

Just wondering how I can create a custom colour scheme based on conditions for a holoviews heatmap. I have created a column for colours that are based on conditions within the data. However, when I plot these the standard cmap appears but my colour scheme appears on the cells when I hover over them. Does anyone know how I can ignore the standard color map that is displaying or implement it so my conditional one appears instead. Example code below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from datetime import datetime
import holoviews as hv
from holoviews import opts
import panel as pn
from bokeh.resources import INLINE
from holoviews import dim
hv.extension('bokeh', 'matplotlib')
gv.extension('bokeh')
pd.options.plotting.backend = 'holoviews'
green = '#00FF00'
amber = '#FFFF00'
red = '#FF0000'
Data = [['A', 'Foo', 0.2] , ['B', 'Bar', 0.9], ['C', 'Cat', 0.7]]
df = pd.DataFrame(Data, columns = ['Name', 'Category', 'Value'])
df['colors'] = df.apply(lambda row: green if row['Value'] >= 0.9 else
amber if row['Value'] < 0.9 and row['Value'] >= 0.7 else
red if row['Value'] < 0.7 else '#8A2BE2', axis = 1)
df_hm = hv.HeatMap(df,kdims=['Category','Name'], vdims=['Value', 'colors']).opts(width=900, height=400, color = hv.dim('colors'), tools=['hover'])
When this code is ran I get the following, which is the standard cmap:
enter image description here
However, when I hover over the cell the color changes to scheme I want, unfortunatly I can't add a picture to show it. But does anyone know how I can make it only show the conditional colouring that I am after.
I've added a picutre of what is happening. When I hover over the cell you can see the conditonal coloring, however there is cmap color overlayed on to this, which I want to remove.
Current behavior
Thanks a bunch for any help!
You are using the wrong keyword in your ops() call. You have to use cmap instead of color.
Here is a very basic example, adapted from here.
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
factors = ["a", "b", "c", "d", "e", "f", "g", "h"]
x = [50, 40, 65, 10, 25, 37, 80, 60]
scatter = hv.Scatter((factors, x))
spikes = hv.Spikes(scatter)
x = ["foo", "foo", "foo", "bar", "bar", "bar", "baz", "baz", "baz"]
y = ["foo", "bar", "baz", "foo", "bar", "baz", "foo", "bar", "baz"]
z = [0, 1, 2, 3, 4, 5, 6, 7, 8]
colors = ['#00FF00','#FFFF00','#FF0000','#FFFF00','#FF0000', '#00FF00','#FF0000', '#00FF00','#FFFF00']
hv.HeatMap((x,y,z)).opts(width=450, height=400, cmap=colors, tools=['hover'])
Output

matplotlib stacked bar chart with zero centerd

I have a dataset like below.
T/F
Value
category
T
1
A
F
3
B
T
5
C
F
7
A
T
8
B
...
...
...
so, I want to draw a bar chart like below. same categoy has same position
same category has same position, zero centered bar and number of F is bar below the horizontal line, T is upper bar.
How can I make this chart with matplotlib.pyplot? or other library
I need example.
One approach involves making the False values negative, and then creating a Seaborn barplot with T/F as hue. You might want to make a copy of the data if you can't change the original.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
data = pd.DataFrame({'T/F': ['T', 'F', 'T', 'F', 'T'],
'Value': [1, 3, 5, 7, 8],
'category': ['A', 'B', 'C', 'A', 'B']})
data['Value'] = np.where(data['T/F'] == 'T', data['Value'], -data['Value'])
ax = sns.barplot(data=data, x='category', y='Value', hue='T/F', dodge=False, palette='turbo')
ax.axhline(0, lw=2, color='black')
plt.tight_layout()
plt.show()

Draw semicircle chart using matplotlib

Is matplotlib capable of creating semicircle charts like this:
I have tried matplotlib.pyplot.pie without success.
It doesn't seem like there is a built-in half-circle type in matplotlib. However, a workaround can be made based on matplotlib.pyplot.pie:
Append the total sum of the data and assign white color to it.
Overlay a white circle in the center by an Artist object (reference).
Sample Code:
import matplotlib.pyplot as plt
# data
label = ["A", "B", "C"]
val = [1,2,3]
# append data and assign color
label.append("")
val.append(sum(val)) # 50% blank
colors = ['red', 'blue', 'green', 'white']
# plot
fig = plt.figure(figsize=(8,6),dpi=100)
ax = fig.add_subplot(1,1,1)
ax.pie(val, labels=label, colors=colors)
ax.add_artist(plt.Circle((0, 0), 0.6, color='white'))
fig.show()
Output:
My solution:
import matplotlib.pyplot as plt
# data
label = ["A", "B", "C"]
val = [1,2,3]
# append data and assign color
label.append("")
val.append(sum(val)) # 50% blank
colors = ['red', 'blue', 'green', 'k']
# plot
plt.figure(figsize=(8,6),dpi=100)
wedges, labels=plt.pie(val, wedgeprops=dict(width=0.4,edgecolor='w'),labels=label, colors=colors)
# I tried this method
wedges[-1].set_visible(False)
plt.show()
Output:
enter image description here

seaborn or matplotlib line chart, line color depending on variable

I have a pandas dataframe with three columns, Date(timestamp), Color('red' or 'blue') and Value(int).
I am currently getting a line chart from it with the following code:
import matplotlib.pyplot as plt
import pandas as pd
Dates=['01/01/2014','02/01/2014','03/01/2014','04/01/2014','05/01/2014','06/01/2014','07/01/2014']
Values=[3,4,6,5,4,5,4]
Colors=['red','red','blue','blue','blue','red','red']
df=pd.DataFrame({'Dates':Dates,'Values':Values,'Colors':Colors})
df['Dates']=pd.to_datetime(df['Dates'],dayfirst=True)
grouped = df.groupby('Colors')
fig, ax = plt.subplots()
for key, group in grouped:
group.plot(ax=ax, x="Dates", y="Values", label=key, color=key)
plt.show()
I'd like the line color to depend on the 'color' columns. How can I achieve that?
I have seen here a similar question for scatterplots, but it doesn't seem I can apply the same solution to a time series line chart.
My output is currently this:
I am trying to achieve something like this (one line only, but several colors)
As I said you could find the answer from the link I attached in the comment:
Dates = ['01/01/2014', '02/01/2014', '03/01/2014', '03/01/2014', '04/01/2014', '05/01/2014']
Values = [3, 4, 6, 6, 5, 4]
Colors = ['red', 'red', 'red', 'blue', 'blue', 'blue']
df = pd.DataFrame({'Dates': Dates, 'Values': Values, 'Colors': Colors})
df['Dates'] = pd.to_datetime(df['Dates'], dayfirst=True)
grouped = df.groupby('Colors')
fig, ax = plt.subplots(1)
for key, group in grouped:
group.plot(ax=ax, x="Dates", y="Values", label=key, color=key)
When color changing you need to add extra point to make line continuous