matplotlib asymmetric errorbar showing wrong information - matplotlib

I am trying to plot a grouped barplot with asymmetrical errobars. When the error bars a symmetrical, it's producing the correct chart. However, for the asymmetric version, the length of the error bar is wrong.
Here is a minimally reproducible code:
# test with code from documentation
men_means, men_std = (20, 35, 30, 35, 27), (2, 3, 4, 1, 2)
women_means, women_std = (25, 32, 34, 20, 25), (3, 5, 2, 3, 3)
# dummy dataframe similar to what I will be using
avg = [20, 35, 30, 35, 27]
men_std_l = [19,33,28,34,25]
men_std_u = [22,37,31,39,29]
df = pd.DataFrame({'avg' :avg, 'low':men_std_l, 'high':men_std_u})
ind = np.arange(df.shape[0]) # the x locations for the groups
width = 0.35 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(ind - width/2, df['avg'], width, yerr=[df['low'].values,df['high'].values], label='Men')
rects2 = ax.bar(ind + width/2, women_means, width, yerr=women_std,label='Women')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('error bar is wrong for asymmetrical, correct otherwise')
ax.legend()
fig.tight_layout()
plt.show()
I have tried the solutions from Asymmetrical errorbar with pandas (getting ValueError: In safezip, len(args[0])=5 but len(args1)=1) and plotting asymmetric errorbars using matplotlib (getting TypeError: Cannot cast array data from dtype('< U1') to dtype('float64') according to the rule 'safe')
Any help is much appreciated.

Answering my own question as I could not understand from the documentation what those lower and upper bounds of errors were. In the hindsight, it should have been clearer if I were not so used to with ggplot in r.
The matplotlib version of asymmetrical errorbar requires the the values to add and subtract from the height of the bars. It does not want the user to provide the upper and lower values, rather the numbers that should be added and subtracted. Therefore, I needed the following:
xel = df['avg'].values - df['low'].values
xeh = df['high'].values - df['avg'].values

Related

Setting independent colorbar scale to y-values of plot using matplotlib and proplot

I have a series of histograms that I plot over the top of each other using a for loop:
import matplotlib as plt
import proplot as pplt
cmap = colormap
fig = pplt.figure(figsize=(12, 10), dpi=300)
jj = [ 4, 3, 2, 1, 0]
for j in jj:
plt.fill_between(p[:,j], s[:, j], y2=0, alpha = 0.6, color = colormap[:,4-j], edgecolor=[0,0,0], linewidth=1.5)
The colormap in question is a manually specified list of RGB triplets (from Fabio Crameri's 'lajolla' map):
0.64566 0.823453 0.895061 0.924676 0.957142
0.277907 0.386042 0.526882 0.657688 0.803006
0.259453 0.301045 0.317257 0.331596 0.408285
Each color corresponds to data recorded under different conditions. I want the colorbar to have manually specified ticks corresponding to this variable (e.g. c = 30, 35, 40, 45, 50), but I can't seem to configure the colormap to not just pull the indices of the cmap matrix (0, 1, 2, 3, 4) as the values of the mapped variable. Trying to set the ticks outside of this range just result in them not being shown.
cbar = fig.colorbar(np.transpose(cmap))
cbar.set_ticks([30, 35, 40, 45, 50])
cbar.set_ticklabels([30, 35, 40, 45, 50])
Any idea how I can resolve this?
Tried shifting indices of colormap but this doesn't seem to work.
Trying to get the colorbar with ticks corresponding to the '30, 35, 40, 45, 50' values quoted above.

How can I use matplotlib.pyplot to customize geopandas plots?

What is the difference between geopandas plots and matplotlib plots? Why are not all keywords available?
In geopandas there is markersize, but not markeredgecolor...
In the example below I plot a pandas df with some styling, then transform the pandas df to a geopandas df. Simple plotting is working, but no additional styling.
This is just an example. In my geopandas plots I would like to customize, markers, legends, etc. How can I access the relevant matplotlib objects?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
X = np.linspace(-6, 6, 1024)
Y = np.sinc(X)
df = pd.DataFrame(Y, X)
plt.plot(X,Y,linewidth = 3., color = 'k', markersize = 9, markeredgewidth = 1.5, markerfacecolor = '.75', markeredgecolor = 'k', marker = 'o', markevery = 32)
# alternatively:
# df.plot(linewidth = 3., color = 'k', markersize = 9, markeredgewidth = 1.5, markerfacecolor = '.75', markeredgecolor = 'k', marker = 'o', markevery = 32)
plt.show()
# create GeoDataFrame from df
df.reset_index(inplace=True)
df.rename(columns={'index': 'Y', 0: 'X'}, inplace=True)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Y'], df['X']))
gdf.plot(linewidth = 3., color = 'k', markersize = 9) # working
gdf.plot(linewidth = 3., color = 'k', markersize = 9, markeredgecolor = 'k') # not working
plt.show()
You're probably confused by the fact that both libraries named the method .plot(. In matplotlib that specifically translates to a mpl.lines.Line2D object, which also contains the markers and their styling.
Geopandas, assumes you want to plot geographic data, and uses a Path for this (mpl.collections.PathCollection). That has for example the face and edgecolors, but no markers. The facecolor comes into play whenever your path closes and forms a polygon (your example doesn't, making it "just" a line).
Geopandas seems to use a bit of a trick for points/markers, it appears to draw a "path" using the "CURVE4" code (cubic Bézier).
You can explore what's happening if you capture the axes that geopandas returns:
ax = gdf.plot(...
Using ax.get_children() you'll get all artists that have been added to the axes, since this is a simple plot, it's easy to see that the PathCollection is the actual data. The other artists are drawing the axis/spines etc.
[<matplotlib.collections.PathCollection at 0x1c05d5879d0>,
<matplotlib.spines.Spine at 0x1c05d43c5b0>,
<matplotlib.spines.Spine at 0x1c05d43c4f0>,
<matplotlib.spines.Spine at 0x1c05d43c9d0>,
<matplotlib.spines.Spine at 0x1c05d43f1c0>,
<matplotlib.axis.XAxis at 0x1c05d036590>,
<matplotlib.axis.YAxis at 0x1c05d43ea10>,
Text(0.5, 1.0, ''),
Text(0.0, 1.0, ''),
Text(1.0, 1.0, ''),
<matplotlib.patches.Rectangle at 0x1c05d351b10>]
If you reduce the amount of points a lot, like use 5 instead of 1024, retrieving the Path's drawn show the coordinates and also the codes used:
pcoll = ax.get_children()[0] # the first artist is the PathCollection
path = pcoll.get_paths()[0] # it only contains 1 Path
print(path.codes) # show the codes used.
# array([ 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
# 4, 4, 4, 4, 4, 4, 4, 4, 79], dtype=uint8)
Some more info about how these paths work can be found at:
https://matplotlib.org/stable/tutorials/advanced/path_tutorial.html
So long story short, you do have all the same keywords as when using Matplotlib, but they're the keywords for Path's and not the Line2D object that you might expect.
You can always flip the order around, and start with a Matplotlib figure/axes created by you, and pass that axes to Geopandas when you want to plot something. That might make it easier or more intuitive when you (also) want to plot other things in the same axes. It does require perhaps a bit more discipline to make sure the (spatial)coordinates etc match.
I personally almost always do that, because it allows to do most of the plotting using the same Matplotlib API's. Which admittedly has perhaps a slightly steeper learning curve. But overall I find it easier compared to having to deal with every package's slightly different interpretation that uses Matplotlib under the hood (eg geopandas, seaborn, xarray etc). But that really depends on where you're coming from.
Thank you for your detailed answer. Based on this I came up with this simplified code from my real project.
I have a shapefile shp and some point data df which I want to plot. shp is plotted with geopandas, df with matplotlib.plt. No need for transferring the point data into a geodataframe gdf as I did initially.
# read marker data (places with coordindates)
df = pd.read_csv("../obese_pct_by_place.csv")
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['sweref99_lng'], df['sweref99_lat']))
# read shapefile
shp = gpd.read_file("../../SWEREF_Shapefiles/KommunSweref99TM/Kommun_Sweref99TM_region.shp")
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_aspect('equal')
shp.plot(ax=ax)
# plot obesity markers
# geopandas, no edgecolor here
# gdf.plot(ax=ax, marker='o', c='r', markersize=gdf['obese'] * 25)
# matplotlib.pyplot with edgecolor
plt.scatter(df['sweref99_lng'], df['sweref99_lat'], c='r', edgecolor='k', s=df['obese'] * 25)
plt.show()

changing the spacing between tick labels on Matplotlib Heatmap

I am using the following code to generate this heatmap:
h= np.vstack((aug2014, sep2014,oct2014, nov2014, dec2014, jan2015, feb2015, mar2015, apr2015, may2015, jun2015, jul2015, aug2015))
dim = np.arange(1, 32, 1)
fig, ax = plt.subplots(figsize=(9,3))
heatmap = ax.imshow(h, cmap=plt.cm.get_cmap('Blues', 4), aspect=0.5, clim=[1,144])
cbar = fig.colorbar(heatmap, ticks = [1, 36, 72, 108, 144], label = 'Number of valid records per day')
ax.set_xlabel("Days", fontsize=15)
ax.set_ylabel("Months", fontsize=15)
ax.set_title("Number of valid records per day", fontsize=20)
ax.set_xticks(range(0,31))
ax.set_xticklabels(dim, rotation=45, ha='center', minor=False)
ax.set_yticks(range(0,13,1))
ax.set_yticklabels(ylabel[7:20])
ax.grid(which = 'minor', color = 'w')
ax.set_facecolor('gray')
fig.show()
As you can see, the labels on the y-axis are not very readable. I was wondering whether there would be a way for me to either increase the dimension of the grid cell or change the scale on the axis to increase the space between the labels. I have tried changing the figsize but all it did was to make the colorbar much bigger than the heatmap. I also have have two subsidiary questions:
Can someone explain to me why the grid lines do not show on the figure although I have defined them?
How can I increase the font of the colorbar title?
Any help would be welcomed!

Plotting points with different colors using corresponding list of labels

I have the following matrix and vector of labels:
The idea is to plot the points within points according to the labels (1 and -1) in y. assume the calculation of the function true_label works.
M = [5, 10, 15, 25, 70]
for m in M:
points = np.random.multivariate_normal(np.zeros(2), np.eye(2), m)
true_labels = true_label(points)
y = np.where(true_labels, 1, -1)
fig, ax = plt.subplots(1, 1)
colors = ['green', 'red', 'blue']
plt.plot(points, c=y, cmap=matplotlib.colors.ListedColormap(colors))
# red is 1, blue is -1
plt.show()
However I can't seem to get this to work..
AttributeError: Unknown property cmap
is what I keep getting. I've updated matplotlib so I dont really understand why this doesnt work. Any advice on how to get this done easily?

Multicolored graph based on data frame values

Im plotting chart based on the data frame as below., I want to show the graph line in different colour based on the column Condition. Im trying the following code but it shows only one colour throughout the graph.
df = pd.DataFrame(dict(
Day=pd.date_range('2018-01-01', periods = 60, freq='D'),
Utilisation = np.random.rand(60) * 100))
df = df.astype(dtype= {"Utilisation":"int64"})
df['Condition'] = np.where(df.Utilisation < 10, 'Winter',
np.where(df.Utilisation < 30, 'Summer', 'Spring'))
condition_map = {'Winter': 'r', 'Summer': 'k', 'Spring': 'b'}
df[['Utilisation','Day']].set_index('Day').plot(figsize=(10,4), rot=90,
color=df.Condition.map(condition_map))
So, I assume you want a graph for each condition.
I would use groupby to separate the data.
# Color setting
season_color = {'Winter': 'r', 'Summer': 'k', 'Spring': 'b'}
# Create figure and axes
f, ax = plt.subplots(figsize = (10, 4))
# Loop over and plot each group of data
for cond, data in df.groupby('Condition'):
ax.plot(data.Day, data.Utilisation, color = season_color[cond], label = cond)
# Fix datelabels
f.autofmt_xdate()
f.legend()
f.show()
If you truly want the date ticks to be rotated 90 degrees, use autofmt_xdate(rotation = 90)
Update:
If you want to plot everything in a single line it's a bit trickier since a line only can have one color associated to it.
You could plot a line between each point and split a line if it crosses a "color boundary", or check out this pyplot example: multicolored line
Another possibility is to plot a lot of scatter points between each point and create a own colormap that represents your color boundaries.
To create a colormap (and norm) I use from_levels_and_colors
import matplotlib.colors
colors = ['#00BEC5', '#a0c483', '#F9746A']
boundaries = [0, 10, 30, 100]
cm, nrm = matplotlib.colors.from_levels_and_colors(boundaries, colors)
To connect each point with next you could shift the dataframe, but here I just zip the original df with a sliced version
from itertools import islice
f, ax = plt.subplots(figsize = (10,4))
for (i,d0), (i,d1) in zip(df.iterrows(), islice(df.iterrows(), 1, None)):
d_range = pd.date_range(d0.Day, d1.Day, freq = 'h')
y_val = np.linspace(d0.Utilisation, d1.Utilisation, d_range.size)
ax.scatter(d_range, y_val, c = y_val, cmap = cm, norm = nrm)
f.autofmt_xdate()
f.show()