Formatting Seaborn Factorplot y-labels to percentages [duplicate] - matplotlib

I have an existing plot that was created with pandas like this:
df['myvar'].plot(kind='bar')
The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)
How can I format the y axis as percentages without changing the line above?
Here is the solution I found but requires that I redefine the plot:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
Link to the above solution: Pyplot: using percentage on x axis

This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):
import ...
import matplotlib.ticker as mtick
ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Update
PercentFormatter was introduced into Matplotlib proper in version 2.1.0.

pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,5))
# you get ax from here
ax = df.plot()
type(ax) # matplotlib.axes._subplots.AxesSubplot
# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])

Jianxun's solution did the job for me but broke the y value indicator at the bottom left of the window.
I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):
import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter
df = pd.DataFrame(np.random.randn(100,5))
ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
Generally speaking I'd recommend using FuncFormatter for label formatting: it's reliable, and versatile.

For those who are looking for the quick one-liner:
plt.gca().set_yticklabels([f'{x:.0%}' for x in plt.gca().get_yticks()])
this assumes
import: from matplotlib import pyplot as plt
Python >=3.6 for f-String formatting. For older versions, replace f'{x:.0%}' with '{:.0%}'.format(x)

I'm late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.
Echoing #Mad Physicist answer, using the package PercentFormatter it would be:
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer

I propose an alternative method using seaborn
Working code:
import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)

You can do this in one line without importing anything:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{}%'.format))
If you want integer percentages, you can do:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{:.0f}%'.format))
You can use either ax.yaxis or plt.gca().yaxis. FuncFormatter is still part of matplotlib.ticker, but you can also do plt.FuncFormatter as a shortcut.

Based on the answer of #erwanp, you can use the formatted string literals of Python 3,
x = '2'
percentage = f'{x}%' # 2%
inside the FuncFormatter() and combined with a lambda expression.
All wrapped:
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: f'{y}%'))

Another one line solution if the yticks are between 0 and 1:
plt.yticks(plt.yticks()[0], ['{:,.0%}'.format(x) for x in plt.yticks()[0]])

add a line of code
ax.yaxis.set_major_formatter(ticker.PercentFormatter())

Related

Sns barplot does not sort sliced values

I want to plot from pd df using sns barplot. Everything works fine :
code associated :
result = df.groupby(['Code departement']).size().sort_values(ascending=False)
x=result.index
y=result.values
plot=sns.barplot(x, y)
plot.set(xlabel='Code departement', ylabel='Nombre de transactions')
sns.barplot(x, y, data=df).set_title('title')
But as you can see in PLOT 1, there are too many bars so I just want the 10 highest, and when I slice x and y :
x=result[:10].index
y=result[:10].values
plot=sns.barplot(x, y)
It prints bars unordered like this :
I checked by printing x and y (sliced) and they are right ordered, Idk what I am missing thank you for your help
You didn't state the version you are using, but probably it isn't the latest. Seaborn as well as matplotlib receive quite some improvements with each new version.
With seaborn 0.11.1 you'd get a warning, as x and y is preferred to be passed via keywords, i.e. sns.barplot(x=x, y=y). The warning tries to avoid confusion with the data= keyword. Apart from that, the numeric x-values would appear sorted numerically.
The order can be controlled via the order= keyword. In this case, sns.barplot(x=x, y=y, order=x). To only have the 10 highest, you can pass sns.barplot(x=x, y=y, order=x[:10]).
Also note that you are creating the bar plot twice (just to change the title?), which can be very confusing. As sns.barplot returns the ax (the subplot onto which the plot has been drawn), the usual approach is ax = sns.barplot(...) and then ax.set_title(...). (The name ax is preferred, to easier understand how matplotlib and seaborn example code can be employed in new code.)
The following example code has been tested with seaborn 0.11.1:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
print(sns.__version__)
df = pd.DataFrame({'Code departement': np.random.randint(1, 51, 1000)})
result = df.groupby(['Code departement']).size().sort_values(ascending=False)
x = result.index
y = result.values
ax = sns.barplot(x, y, order=x[:10])
ax.set(xlabel='Code departement', ylabel='Nombre de transactions')
ax.set_title('title')
plt.show()

transform the values of one axis to its log

I'm trying to transform the scales on y-axis to the log values. For example, if one of the numbers on y is 0.01, I want to get -2 (which is log(0.01)). How should I do this in matplotlib (or any other library)?!
Thanks,
Without plt.yscale('log') there will be few y-ticks visible that have a nice number as log. You can change the "formatter" to a function that only shows the exponent. Also note that in the latest seaborn version distplot has been replaced by histplot(..., kde=True) or kdeplot(...).
Here is an example:
import matplotlib.pyplot as plt
from matplotlib.ticker import LogFormatterExponent
import numpy as np
import seaborn as sns
x = np.random.randn(10, 1000).cumsum(axis=1).ravel()
ax = sns.histplot(x, kde=True, stat='density', color='purple')
ax.set_yscale('log')
ax.yaxis.set_major_formatter(LogFormatterExponent(base=10.0, labelOnlyBase=True))
ax.set_ylabel(ax.get_ylabel() + ' (exponent)')
ax.margins(x=0)
plt.show()

matplotlib - seaborn - the numbers on the correlation plots are not readable

The plot below shows the correlation for one column. The problem is that the numbers are not readable, because there are many columns in it.
How is it possible to show only 5 or 6 most important columns and not all of them with very low importance?
plt.figure(figsize=(20,3))
sns.heatmap(df.corr()[['price']].sort_values('price', ascending=False).iloc[1:].T, annot=True,
cmap='Spectral_r', vmax=0.9, vmin=-0.31)
You can limit the cells shown via .iloc[1:7]. If you also want to show the highest negative values, you could create a second plot with .iloc[-6:]. To have both together, you could use numpy's slicing function and write .iloc[np.r_[1:4, -3:0]].
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.rand(7, 27), columns=['price'] + [*'abcdefghijklmnopqrstuvwxyz'])
plt.figure(figsize=(20, 3))
sns.heatmap(df.corr()[['price']].sort_values('price', ascending=False).iloc[1:7].T,
annot=True, annot_kws={'rotation':90, 'size': 20},
cmap='Spectral_r', vmax=0.9, vmin=-0.31)
plt.show()
annot can also be a list of labels. Using this, you can define a string matrix that you use to display the desired numbers and set the others to an empty string.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import seaborn as sns; sns.set_theme()
import pandas as pd
from string import ascii_letters
# generate random data
rs = np.random.RandomState(33)
df = pd.DataFrame(data=rs.normal(size=(100, 26)),
columns=list(ascii_letters[26:]))
importance_index = 5 # until which idx to hide values
data = df.corr()[['A']].sort_values('A', ascending=False).iloc[1:].T
labels = data.astype(str) # make a str-copy
labels.iloc[0,:importance_index] = ' ' # mask columns that you want to hide
sns.heatmap(data, annot=labels, cmap='Spectral_r', vmax=0.9, vmin=-0.31, fmt='', annot_kws={'rotation':90})
plt.show()
The output on some random data:
This works but it has its limits, particulary with setting fmt='' (can't use it to conveniently format decimals anymore, need to do it manually now). I would also question whether your approach is even the best one to take here. I think consistency in plots is quite important. I would rather evaluate if we can't rotate the heatmap labels (I've included it above) or leave them out completely since it is technically redundant due to the color-coding. Alternatively, you could only plot the cells with the "important" values.

how to prevent seaborn to skip year in xtick label in Timeseries Plot

I have included the screenshot of the plot. Is there a way to prevent seaborn from skipping the xtick labels in timeseries data.
Most seaborn functions return a matplotlib object, so you can control the number of major ticks displayed via matplotlib. By default, matplotlib will auto-scale, which is why it hides some year labels, you can try to set the MaxNLocator.
Consider the following example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# load data
df = sns.load_dataset('flights')
df.drop_duplicates('year', inplace=True)
df.year = df.year.astype('str')
# plot
fig, ax = plt.subplots(figsize=(5, 2))
sns.lineplot(x='year', y='passengers', data=df, ax=ax)
ax.xaxis.set_major_locator(plt.MaxNLocator(5))
This gives you:
ax.xaxis.set_major_locator(plt.MaxNLocator(10))
will give you
Agree with answer of #steven, just want to say that methods for xticks like plt.xticks or ax.xaxis.set_ticks seem more natural to me. Full details can be found here.

How to plot Series with selective ticks?

I have a Series that I would like to plot as a bar chart: pd.Series([-4,2, 3,3, 4,5,9,20]).value_counts()
Since I have many bars I only want to display some (equidistant) ticks.
However, unless I actively work against it, pyplot will print the wrong labels. E.g. if I leave out set_xticklabels in the code below I get
where every element from the index is taken and just displayed with the specified distance.
This code does what I want:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
s = pd.Series([-4,2, 3,3, 4,5,9,20]).value_counts().sort_index()
mi,ma = min(s.index), max(s.index)
s = s.reindex(range(mi,ma+1,1), fill_value=0)
distance = 10
a = s.plot(kind='bar')
condition = lambda t: int(t[1].get_text()) % 10 == 0
ticks_,labels_=zip(*filter(condition, zip(a.get_xticks(), a.get_xticklabels())))
a.set_xticks(ticks_)
a.set_xticklabels(labels_)
plt.show()
But I still feel like I'm being unnecessarily clever here. Am I missing a function? Is this the best way of doing that?
Consider not using a pandas bar plot in case you intend to plot numeric values; that is because pandas bar plots are categorical in nature.
If instead using a matplotlib bar plot, which is numeric in nature, there is no need to tinker with any ticks at all.
s = pd.Series([-4,2, 3,3, 4,5,9,20]).value_counts().sort_index()
plt.bar(s.index, s)
I think you overcomplicated it. You can simply use the following. You just need to find the relationship between the ticks and the ticklabels.
a = s.plot(kind='bar')
xticks = np.arange(0, max(s)*10+1, 10)
plt.xticks(xticks + abs(mi), xticks)