Plot enum values in dataframe using pyplot segmented bar - pandas

I have a Series of (timestamp, enum value) describing when a system was in a given state; the state is described by the enum.
Time | State
--------------
0 | A
3 | B
4 | A
7 | C
9 | D
I'd like to visualize the state changes in a bar plot by filling each state forward to the next timestamp, and using a different color for each enum value:
|
|__________________________________________
| A | B | A | C | D |
|___________|___|___________|_______|_____|
|
---------------------------------------------
0 1 2 3 4 5 6 7 8 9 10
Any advice? I've looked into Line Collections and horizontal bars, but Line Collections seem clunky and hbar seems to be for scalar values. I'm hoping to find an elegant idiomatic solution.

You can create bar charts specifying left starting points and widths:
color = {'A': 'red', 'B': 'green', 'C': 'blue', 'D': 'yellow'}
for s, t, c in list(zip(df.State, df.Time.shift(-1) - df.Time, df.Time))[: -1]:
bar(left=c, height=0.8, width=t, bottom=0, color=color[s], orientation="horizontal", label=s)
print(c, t)
legend();
You can also call
get_yaxis().set_visible(False)
use better colors, and make this figure less ugly (it's hard to make it more ugly).

I came up with a solution that uses LineCollection. My dissatisfaction with it is that the LineCollection elements seem scale-invariant (they appear the same width no matter what the y-axis scale is), which makes it hard to manipulate. Because of this shortcoming I think I prefer the bar solution.
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.patches as mpatches
import pandas as pd
df = pd.DataFrame(zip([0, 3, 4, 7, 9, 12], ['A', 'B', 'A', 'C', 'D', 'A']),
columns=['Time', 'State'])
df['Duration'] = df['Time'].shift(-1) - df['Time']
# Showing how we can use HTML color names
state_to_color_map = {
'A': 'LightBlue',
'B': 'gray',
'C': 'LightPink',
'D': 'MediumSeaGreen'
}
fig = plt.figure(figsize=(8, 4))
ax = fig.gca()
plot_height = 0 # Can loop this to plot multiple bars
for state, state_color in state_to_color_map.iteritems():
segments = [[(start, plot_height), (start + duration, plot_height)] for
(start, duration) in
df[df['State'] == state][['Time', 'Duration']].values]
plot_segments = mpl.collections.LineCollection(
segments=segments,
# In matplotlib 2.2.2, this code is `mcolors.to_rgba(...)`
# Use this code for matplotlib 1.5.3.
colors=[mcolors.colorConverter.to_rgba(state_color)] * len(segments),
linewidths=50)
ax.add_collection(plot_segments)
ax.set_ylim(-1, 1)
ax.set_xlim(0, 12)
# Legend
patches = []
for state, color in sorted(state_to_color_map.iteritems()):
patches.append(mpatches.Patch(color=color, label=state))
ax.legend(handles=patches, bbox_to_anchor=(1.10, 0.5), loc='center',
borderaxespad=0.)

Related

How to start Seaborn Logarithmic Barplot at y=1

I have a problem figuring out how to have Seaborn show the right values in a logarithmic barplot. A value of mine should be, in the ideal case, be 1. My dataseries (5,2,1,0.5,0.2) has a set of values that deviate from unity and I want to visualize these in a logarithmic barplot. However, when plotting this in the standard log-barplot it shows the following:
But the values under one are shown to increase from -infinity to their value, whilst the real values ought to look like this:
Strangely enough, I was unable to find a Seaborn, Pandas or Matplotlib attribute to "snap" to a different horizontal axis or "align" or ymin/ymax. I have a feeling I am unable to find it because I can't find the terms to shove down my favorite search engine. Some semi-solutions I found just did not match what I was looking for or did not have either xaxis = 1 or a ylog. A try that uses some jank Matplotlib lines:
If someone knows the right terms or a solution, thank you in advance.
Here are the Jupyter cells I used:
{1}
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
data = {'X': ['A','B','C','D','E'], 'Y': [5,2,1,0.5,0.2]}
df = pd.DataFrame(data)
{2}
%matplotlib widget
g = sns.catplot(data=df, kind="bar", y = "Y", x = "X", log = True)
{3}
%matplotlib widget
plt.vlines(x=data['X'], ymin=1, ymax=data['Y'])
You could let the bars start at 1 instead of at 0. You'll need to use sns.barplot directly.
The example code subtracts 1 of all y-values and sets the bar bottom at 1.
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import seaborn as sns
import pandas as pd
import numpy as np
data = {'X': ['A', 'B', 'C', 'D', 'E'], 'Y': [5, 2, 1, 0.5, 0.2]}
df = pd.DataFrame(data)
ax = sns.barplot(y=df["Y"] - 1, x=df["X"], bottom=1, log=True, palette='flare_r')
ax.axhline(y=1, c='k')
# change the y-ticks, as the default shows too few in this case
ax.set_yticks(np.append(np.arange(.2, .8, .1), np.arange(1, 7, 1)), minor=False)
ax.set_yticks(np.arange(.3, 6, .1), minor=True)
ax.yaxis.set_major_formatter(lambda x, pos: f'{x:.0f}' if x >= 1 else f'{x:.1f}')
ax.yaxis.set_minor_formatter(NullFormatter())
ax.bar_label(ax.containers[0], labels=df["Y"])
sns.despine()
plt.show()
PS: With these specific values, the plot might go without logscale:

matplotlib stacked bar chart with zero centerd

I have a dataset like below.
T/F
Value
category
T
1
A
F
3
B
T
5
C
F
7
A
T
8
B
...
...
...
so, I want to draw a bar chart like below. same categoy has same position
same category has same position, zero centered bar and number of F is bar below the horizontal line, T is upper bar.
How can I make this chart with matplotlib.pyplot? or other library
I need example.
One approach involves making the False values negative, and then creating a Seaborn barplot with T/F as hue. You might want to make a copy of the data if you can't change the original.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
data = pd.DataFrame({'T/F': ['T', 'F', 'T', 'F', 'T'],
'Value': [1, 3, 5, 7, 8],
'category': ['A', 'B', 'C', 'A', 'B']})
data['Value'] = np.where(data['T/F'] == 'T', data['Value'], -data['Value'])
ax = sns.barplot(data=data, x='category', y='Value', hue='T/F', dodge=False, palette='turbo')
ax.axhline(0, lw=2, color='black')
plt.tight_layout()
plt.show()

Plotting annual mean and standard deviation in different colors for each year

I have data for several years. I have calculated mean and standard deviation for each year. Now I want to plot each row with mean as a scatter plot and fill plot between the standard deviations that is mean plus minus standard deviation in different colors for different years.
After using df_wc.set_index('Date').resample('Y')["Ratio(a/w)"].mean() it returns only the last date of the year (as shown below in the data set) but I want the fill plot for standard deviation to spread for the entire year.
Sample Data set:
Date | Mean | Std_dv
1858-12-31 1.284273 0.403052
1859-12-31 1.235267 0.373283
1860-12-31 1.093308 0.183646
1861-12-31 1.403693 0.400722
That's a very good question that you have asked, and it did not have an easy answer. But if I had understood the problem correctly, you need a fill plot with different colours for each year. The upper bound and lower bound of the plot will be between mean + std and mean - std?
So, I formed a custom time series and this is how I have plotted the values with the upper bound and lower bounds:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection,PatchCollection
from matplotlib.colors import ListedColormap, BoundaryNorm
import pandas as pd
ts = range(10)
num_classes = len(ts)
df = pd.DataFrame(data={'TOTAL': np.random.rand(len(ts)), 'Label': list(range(0, num_classes))}, index=ts)
df['UB'] = df['TOTAL'] + 2
df['LB'] = df['TOTAL'] - 2
print(df)
colors = ['r', 'g', 'b', 'y', 'purple', 'orange', 'k', 'pink', 'grey', 'violet']
cmap = ListedColormap(colors)
norm = BoundaryNorm(range(num_classes+1), cmap.N)
points = np.array([df.index, df['TOTAL']]).T.reshape(-1, 1, 2)
pointsUB = np.array([df.index, df['UB']]).T.reshape(-1, 1, 2)
pointsLB = np.array([df.index, df['LB']]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
segmentsUB = np.concatenate([pointsUB[:-1], pointsUB[1:]], axis=1)
segmentsLB = np.concatenate([pointsLB[:-1], pointsLB[1:]], axis=1)
lc = LineCollection(segments, cmap=cmap, norm=norm, linestyles='dashed')
lc.set_array(df['Label'])
lcUB = LineCollection(segmentsUB, cmap=cmap, norm=norm, linestyles='solid')
lcUB.set_array(df['Label'])
lcLB = LineCollection(segmentsLB, cmap=cmap, norm=norm, linestyles='solid')
lcLB.set_array(df['Label'])
fig1 = plt.figure()
plt.gca().add_collection(lc)
plt.gca().add_collection(lcUB)
plt.gca().add_collection(lcLB)
for i in range(len(colors)):
plt.fill_between( df.index,df['UB'],df['LB'], where= ((df.index >= i) & (df.index <= i+1)), alpha = 0.1,color=colors[i])
plt.xlim(df.index.min(), df.index.max())
plt.ylim(-3.1, 3.1)
plt.show()
And the result dataframe obtained looks like this:
TOTAL Label UB LB
0 0.681455 0 2.681455 -1.318545
1 0.987058 1 2.987058 -1.012942
2 0.212432 2 2.212432 -1.787568
3 0.252284 3 2.252284 -1.747716
4 0.886021 4 2.886021 -1.113979
5 0.369499 5 2.369499 -1.630501
6 0.765192 6 2.765192 -1.234808
7 0.747923 7 2.747923 -1.252077
8 0.543212 8 2.543212 -1.456788
9 0.793860 9 2.793860 -1.206140
And the plot looks like this:
Let me know if this helps! :)

Is there a way to use multiple subplots for the same graphic using Seaborn?

I'd like to have two plots (a scatter and a bar) shown on the same figure, with the following layout:
| 0 | 1 | 2 | 3 |
But where 0 through 2 are filled with the first plot, and 3 is filled with the second. I've tried to use fig, ax = plt.subplots(1,4,figsize = (12, 10)) to create a 1x4 array of subplots, but when I call sns.scatterplot(..., ax=...), the ax argument is only able to accept one subplot label.
Is there a way, either in the subplot call or in the ax argument, to make a plot that is 75% of the width?
This is one way to do it using plt.subplots() by utilizing the keyword gridspec_kwdict, which takes dictionary that is passed to the GridSpec constructor used to create the grid on which the subplots are placed.
import numpy as np
from collections import Counter
import matplotlib.pyplot as plt
np.random.seed(123)
data = np.random.randint(0, 10, 100)
x, y = zip(*Counter(data).items())
fig, (ax1, ax2) = plt.subplots(1, 2, gridspec_kw={'width_ratios': [3, 1]},
figsize=(10, 4))
ax1.scatter(x, y)
ax2.bar(x, y)
plt.tight_layout()

Geopandas plots as subfigures

Say I have the following geodataframe that contains 3 polygon objects.
import geopandas as gpd
from shapely.geometry import Polygon
p1=Polygon([(0,0),(0,1),(1,1),(1,0)])
p2=Polygon([(3,3),(3,6),(6,6),(6,3)])
p3=Polygon([(3,.5),(4,2),(5,.5)])
gdf=gpd.GeoDataFrame(geometry=[p1,p2,p3])
gdf['Value1']=[1,10,20]
gdf['Value2']=[300,200,100]
gdf content:
>>> gdf
geometry Value1 Value2
0 POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0)) 1 300
1 POLYGON ((3 3, 3 6, 6 6, 6 3, 3 3)) 10 200
2 POLYGON ((3 0.5, 4 2, 5 0.5, 3 0.5)) 20 100
>>>
I can make a separate figure for each plot by calling geopandas.plot() twice. However, is there a way for me to plot both of these maps next to each other in the same figure as subfigures?
Always always always create your matplotlib objects ahead of time and pass them to the plotting methods (or use them directly). Doing so, your code becomes:
from matplotlib import pyplot
import geopandas
from shapely import geometry
p1 = geometry.Polygon([(0,0),(0,1),(1,1),(1,0)])
p2 = geometry.Polygon([(3,3),(3,6),(6,6),(6,3)])
p3 = geometry.Polygon([(3,.5),(4,2),(5,.5)])
gdf = geopandas.GeoDataFrame(dict(
geometry=[p1, p2, p3],
Value1=[1, 10, 20],
Value2=[300, 200, 100],
))
fig, (ax1, ax2) = pyplot.subplots(ncols=2, sharex=True, sharey=True)
gdf.plot(ax=ax1, column='Value1')
gdf.plot(ax=ax2, column='Value2')
Which gives me:
// for plotting multiple GeoDataframe
import geopandas as gpd
gdf = gpd.read_file(geojson)
fig, axes = plt.subplots(1,4, figsize=(40,10))
axes[0].set_title('Some Title')
gdf.plot(ax=axes[0], column='Some column for coloring', cmap='coloring option')
axes[0].set_title('Some Title')
gdf.plot(ax=axes[0], column='Some column for coloring', cmap='coloring option')