How to add scientific labels to histogram plots?

How to add scientific labels to histogram plots? - matplotlib

I'm trying to print scientific labels above each bar plot, similar to my y-axis that already is set to a scientific notation. I've tried using "{:.2E}".format but that doesnt work as i'm trying to convert an entire array (count). Any ideas?
xx= np.arange(0,10) ,
labels = [0.2,0.1,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0]
plt.figure(figsize=(7, 4.5)) , plt.bar(xx, count),
plt.xlabel('base.distance') , plt.xticks(xx, labels)
addlabels(xx,count.astype('int32'))
plt.ticklabel_format(axis='y', style='sci', scilimits=(0,0))
plt.ylabel('Count'), plt.title(file), plt.show()

Barplots have a helper function to add labels, which will format them, but the docs for it show an old style of formatting-string (often called printf formatting). If you want all the control of format-strings, you can make a list of labels. Try this:
import numpy as np
import matplotlib.pyplot as plt
import random
xx= np.arange(0,10)
labels = [0.2,0.1,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0]
count = random.sample(range(3000,14000), len(xx))
fig, ax = plt.subplots(figsize=(7, 4.5))
barplot = ax.bar(xx, count)
plt.xlabel('base.distance')
plt.xticks(xx, labels)
#ax.bar_label(barplot) # the whole numbers
#ax.bar_label(barplot, fmt='%2.1e') #old school
ax.bar_label(barplot, labels=map(lambda x: '{:.2E}'.format(x), count))
plt.ticklabel_format(axis='y', style='sci', scilimits=(0,0))
plt.ylabel('Count')
plt.show()

Related

Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Am trying to find hist()'s figsize and layout parameter for sns.pairplot().
I have a pairplot that gives me nice scatterplots between the X's and y. However, it is oriented horizontally and there is no equivalent layout parameter to make them vertical to my knowledge. 4 plots per row would be great.
This is my current sns.pairplot():
sns.pairplot(X_train,
x_vars = X_train.select_dtypes(exclude=['object']).columns,
y_vars = ["SalePrice"])
This is what I would like it to look like: Source
num_mask = train_df.dtypes != object
num_cols = train_df.loc[:, num_mask[num_mask == True].keys()]
num_cols.hist(figsize = (30,15), layout = (4,10))
plt.show()

What you want to achieve isn't currently supported by sns.pairplot, but you can use one of the other figure-level functions (sns.displot, sns.catplot, ...). sns.lmplot creates a grid of scatter plots. For this to work, the dataframe needs to be in "long form".
Here is a simple example. sns.lmplot has parameters to leave out the regression line (fit_reg=False), to set the height of the individual subplots (height=...), to set its aspect ratio (aspect=..., where the subplot width will be height times aspect ratio), and many more. If all y ranges are similar, you can use the default sharey=True.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some test data with different y-ranges
np.random.seed(20230209)
X_train = pd.DataFrame({"".join(np.random.choice([*'uvwxyz'], np.random.randint(3, 8))):
np.random.randn(100).cumsum() + np.random.randint(100, 1000) for _ in range(10)})
X_train['SalePrice'] = np.random.randint(10000, 100000, 100)
# convert the dataframe to long form
# 'SalePrice' will get excluded automatically via `melt`
compare_columns = X_train.select_dtypes(exclude=['object']).columns
long_df = X_train.melt(id_vars='SalePrice', value_vars=compare_columns)
# create a grid of scatter plots
g = sns.lmplot(data=long_df, x='SalePrice', y='value', col='variable', col_wrap=4, sharey=False)
g.set(ylabel='')
plt.show()
Here is another example, with histograms of the mpg dataset:
import matplotlib.pyplot as plt
import seaborn as sns
mpg = sns.load_dataset('mpg')
compare_columns = mpg.select_dtypes(exclude=['object']).columns
mpg_long = mpg.melt(value_vars=compare_columns)
g = sns.displot(data=mpg_long, kde=True, x='value', common_bins=False, col='variable', col_wrap=4, color='crimson',
facet_kws={'sharex': False, 'sharey': False})
g.set(xlabel='')
plt.show()

Add text flush left below plot in python

I'd like to add text beneath a plot, which includes the source of the used data.
It should be positioned at the edge of the image, so beneath the longest ytick and if possible at a fixed vertical distance to the x-axis.
My approach:
import matplotlib.pyplot as plt
country = ['Portugal','Spain','Austria','Italy','France','Federal Republic of Germany']
value = [6,8,10,12,14,25]
plt.figure(figsize=(4,4))
plt.barh(country,value)
plt.xlabel('x-axis')
plt.text(-18,-2.5,'Source: blablablablablablablablablablablablablablablablabla',ha='left')
Plot of the code
I used plt.text(). My problem with the command is, that I have to manually try x and y values (in the code: -18,-2.5) for different plots.
Is there a better way?
Thanks in advance.

Firstly, I got the box info of yticklabels, and then got the leftmost x location for all the yticklabels. Finally, the blended transform method was used to add text with some location adjustments.
import matplotlib.pyplot as plt
from matplotlib.transforms import IdentityTransform
import matplotlib.transforms as transforms
country = ['Portugal','Spain','Austria','Italy','France','Federal Republic of Germany']
value = [6,8,10,12,14,25]
plt.figure(figsize=(4,4))
plt.barh(country,value)
plt.xlabel('x-axis')
ax = plt.gca()
fig =plt.gcf()
fig.tight_layout()
fig.canvas.draw()
labs = ax.get_yticklabels()
xlocs = []
for ilab in labs:
xlocs.append(ilab.get_window_extent().x0)
print(xlocs)
x0 = min(xlocs)
trans = transforms.blended_transform_factory(IdentityTransform(), ax.transAxes)
plt.text(x0-2.5,-0.2,'Source: blablablablablablablablablablablablablablablablabla',ha='left',transform=trans)
plt.savefig("flush.png",bbox_inches="tight")

Align multi-line ticks in Seaborn plot

I have the following heatmap:
I've broken up the category names by each capital letter and then capitalised them. This achieves a centering effect across the labels on my x-axis by default which I'd like to replicate across my y-axis.
yticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.index]
xticks = [re.sub("(?<=.{1})(.?)(?=[A-Z]+)", "\\1\n", label, 0, re.DOTALL).upper() for label in corr.columns]
fig, ax = plt.subplots(figsize=(20,15))
sns.heatmap(corr, ax=ax, annot=True, fmt="d",
cmap="Blues", annot_kws=annot_kws,
mask=mask, vmin=0, vmax=5000,
cbar_kws={"shrink": .8}, square=True,
linewidths=5)
for p in ax.texts:
myTrans = p.get_transform()
offset = mpl.transforms.ScaledTranslation(-12, 5, mpl.transforms.IdentityTransform())
p.set_transform(myTrans + offset)
plt.yticks(plt.yticks()[0], labels=yticks, rotation=0, linespacing=0.4)
plt.xticks(plt.xticks()[0], labels=xticks, rotation=0, linespacing=0.4)
where corr represents a pre-defined pandas dataframe.
I couldn't seem to find an align parameter for setting the ticks and was wondering if and how this centering could be achieved in seaborn/matplotlib?

I've adapted the seaborn correlation plot example below.
from string import ascii_letters
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white")
# Generate a large random dataset
rs = np.random.RandomState(33)
d = pd.DataFrame(data=rs.normal(size=(100, 7)),
columns=['Donald\nDuck','Mickey\nMouse','Han\nSolo',
'Luke\nSkywalker','Yoda','Santa\nClause','Ronald\nMcDonald'])
# Compute the correlation matrix
corr = d.corr()
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
for i in ax.get_yticklabels():
i.set_ha('right')
i.set_rotation(0)
for i in ax.get_xticklabels():
i.set_ha('center')
Note the two for sequences above. These get the label and then set the horizontal alignment (You can also change the vertical alignment (set_va()).
The code above produces this:

set_xlim() does not work with text labels

I am trying to zoom in on geopandas map with labels using set_xlim() in with matplotlib. I basically adapted this SO question to add labels to a map.
However, set_xlim() does not seem to work and did not zoom in on the given extent. (By the way, I've also tried to use text() instead of annotate(), to no avail.)
What I did was the following:
I used the same US county data as in the question linked above, extracted the files, and then executed the following in Jupyter notebook:
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
shpfile='shp/cb_2015_us_county_20m.shp'
gdf=gpd.read_file(shpfile)
gdf.plot()
, which gives a map of all US counties as expected:
Adding labels as with one of the answers also works:
ax = gdf.plot()
gdf.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha='center'),axis=1);
However, when trying to zoom in to a particular geographic extent with set_xlim() and set_ylim() as follows:
ax = gdf.plot()
gdf.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha='center'),axis=1);
ax.set_xlim(-84.2, -83.4)
ax.set_ylim(42, 42.55)
, the two functions do not seem to work. Instead of zooming in, they just trimmed everything outside of the given extent.
If the labeling code is dropped out (gdf.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha='center'),axis=1);, the set_xlim() works as expected:
My question is:
What is the correct way to zoom in to an area when labels are present in a plot?

You need some coordinate transformation.
import cartopy.crs as ccrs
# relevant code follows
# set numbers in degrees of longitude
ax.set_xlim(-84.2, -83.4, ccrs.PlateCarree())
# set numbers in degrees of latitude
ax.set_ylim(42, 42.55, ccrs.PlateCarree())
plt.show()
with the option ccrs.PlateCarree(), the input values are transformed to proper data coordinates.

When I try it, I can't draw on matplotlib with the axes restricted. So it's possible to extract the data.
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots(1,1, figsize=(4,4), dpi=144)
shpfile = './cb_2015_us_county_20m/cb_2015_us_county_20m.shp'
gdf = gpd.read_file(shpfile)
# gdf = gdf.loc[gdf['STATEFP'] == '27']
gdf['coords'] = gdf['geometry'].apply(lambda x: x.representative_point().coords[:])
gdf['coords'] = [coords[0] for coords in gdf['coords']]
gdf = (gdf[(gdf['coords'].str[0] >= -84.2) & (gdf['coords'].str[0] <= -83.4)
& (gdf['coords'].str[1] >= 42) & (gdf['coords'].str[1] <= 42.55)])
gdf.plot(ax=ax)
gdf.apply(lambda x: ax.annotate(text=x.NAME, xy=x.geometry.centroid.coords[0], ha='center'),axis=1)

Matplotlib histogram with errorbars

I have created a histogram with matplotlib using the pyplot.hist() function. I would like to add a Poison error square root of bin height (sqrt(binheight)) to the bars. How can I do this?
The return tuple of .hist() includes return[2] -> a list of 1 Patch objects. I could only find out that it is possible to add errors to bars created via pyplot.bar().

Indeed you need to use bar. You can use to output of hist and plot it as a bar:
import numpy as np
import pylab as plt
data = np.array(np.random.rand(1000))
y,binEdges = np.histogram(data,bins=10)
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
menStd = np.sqrt(y)
width = 0.05
plt.bar(bincenters, y, width=width, color='r', yerr=menStd)
plt.show()

Alternative Solution
You can also use a combination of pyplot.errorbar() and drawstyle keyword argument. The code below creates a plot of the histogram using a stepped line plot. There is a marker in the center of each bin and each bin has the requisite Poisson errorbar.
import numpy
import pyplot
x = numpy.random.rand(1000)
y, bin_edges = numpy.histogram(x, bins=10)
bin_centers = 0.5*(bin_edges[1:] + bin_edges[:-1])
pyplot.errorbar(
bin_centers,
y,
yerr = y**0.5,
marker = '.',
drawstyle = 'steps-mid-'
)
pyplot.show()
My personal opinion
When plotting the results of multiple histograms on the the same figure, line plots are easier to distinguish. In addition, they look nicer when plotting with a yscale='log'.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to add scientific labels to histogram plots? - matplotlib

Related

Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Add text flush left below plot in python

Align multi-line ticks in Seaborn plot

set_xlim() does not work with text labels

Matplotlib histogram with errorbars

Categories

Resources