transform the values of one axis to its log - matplotlib

I'm trying to transform the scales on y-axis to the log values. For example, if one of the numbers on y is 0.01, I want to get -2 (which is log(0.01)). How should I do this in matplotlib (or any other library)?!
Thanks,

Without plt.yscale('log') there will be few y-ticks visible that have a nice number as log. You can change the "formatter" to a function that only shows the exponent. Also note that in the latest seaborn version distplot has been replaced by histplot(..., kde=True) or kdeplot(...).
Here is an example:
import matplotlib.pyplot as plt
from matplotlib.ticker import LogFormatterExponent
import numpy as np
import seaborn as sns
x = np.random.randn(10, 1000).cumsum(axis=1).ravel()
ax = sns.histplot(x, kde=True, stat='density', color='purple')
ax.set_yscale('log')
ax.yaxis.set_major_formatter(LogFormatterExponent(base=10.0, labelOnlyBase=True))
ax.set_ylabel(ax.get_ylabel() + ' (exponent)')
ax.margins(x=0)
plt.show()

Related

Scale Y axis of matplotlib plot in jupyter notebook

I want to scale Y axis so that I can see values, as code below plots cant see anything other than a thin black line. Changing plot height doesn't expand the plot.
import numpy as np
import matplotlib.pyplot as plt
data=np.random.random((4,10000))
plt.rcParams["figure.figsize"] = (20,100)
#or swap line above with one below, still no change in plot height
#fig=plt.figure(figsize=(20, 100))
plt.matshow(data)
plt.show()
One way to do this is just repeat the values then plot result, but I would have thought it possible to just scale the height of the plot?
data_repeated = np.repeat(data, repeats=1000, axis=0)
You can do it like this:
import numpy as np
import matplotlib.pyplot as plt
data=np.random.random((4, 10000))
plt.figure(figsize=(40, 10))
plt.matshow(data, fignum=1, aspect='auto')
plt.show()
Output:

Display x-axis values on horizontal matplotlib histogram

I'd like to use matplotlib to display a horizontal histogram similar to the one below:
The code below works fine for vertical histograms:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
plt.hist(df['A'])
plt.show()
The orientation='horizontal' parameter makes the bars horizontal, but clobbers the horizontal scale.
plt.hist(df['A'],orientation='horizontal')
The following works, but feels like a lot of work. Is there a better way?
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.set_xticks([0,5,10])
ax.set_xticklabels([0,5,10])
ax.set_yticks([0,1])
ax.set_yticklabels(['Male','Female'])
df['A'].hist(ax=ax,orientation='horizontal')
fig.tight_layout() # Improves appearance a bit.
plt.show()
plt.hist(df['A']) only works by coincidence. I would recommend not to use plt.hist for non-numeric or categorical plots - it's not meant to be used for that.
Also, it's often a good idea to separate data aggregation from visualization. So, using pandas plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
df["A"].value_counts().plot.barh()
plt.show()
Or using matplotlib plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
counts = df["A"].value_counts()
plt.barh(counts.index, counts)
plt.show()

plot shuffled array numpy

I am writting a very simple script, one that plot a sin using jupyter notebook (python 3). when I put:
import numpy
import matplotlib.pyplot as plt
x=np.arange(0.0,5*np.pi,0.001)
y = np.sin(x)
plt.plot(x,y)
The plot is fine.
However if :
import numpy
import matplotlib.pyplot as plt
x=np.arange(0.0,5*np.pi,0.001)
np.random.shuffle(x)
y = np.sin(x)
plt.plot(x,y)
the image is
I don't understand why shuffling the x BEFORE I ran sin does it.
thank you
Let's first simplify things a bit. We plot 4 points and annote them with the order in which they are plotted.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
x=np.arange(4)
y = np.sin(x)
plt.plot(x,y, marker="o")
for i, (xi,yi) in enumerate(zip(x,y)):
plt.annotate(str(i), xy=(xi,yi), xytext=(0,4),
textcoords="offset points", ha="center")
plt.show()
No if we shuffle x and plot the same graph,
x=np.arange(4)
np.random.shuffle(x)
y = np.sin(x)
we see that positions of the points are still are the same, but while e.g. previously the first point was the one at (0,0), it's now the third one appearing there. Due to this randomized order, the connecting lines go zickzack.
Now if you use enough points, all those lines will add up to look like a complete surface, which is what you get in your image.

Formatting Seaborn Factorplot y-labels to percentages [duplicate]

I have an existing plot that was created with pandas like this:
df['myvar'].plot(kind='bar')
The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)
How can I format the y axis as percentages without changing the line above?
Here is the solution I found but requires that I redefine the plot:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
Link to the above solution: Pyplot: using percentage on x axis
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):
import ...
import matplotlib.ticker as mtick
ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Update
PercentFormatter was introduced into Matplotlib proper in version 2.1.0.
pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,5))
# you get ax from here
ax = df.plot()
type(ax) # matplotlib.axes._subplots.AxesSubplot
# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
Jianxun's solution did the job for me but broke the y value indicator at the bottom left of the window.
I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):
import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter
df = pd.DataFrame(np.random.randn(100,5))
ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
Generally speaking I'd recommend using FuncFormatter for label formatting: it's reliable, and versatile.
For those who are looking for the quick one-liner:
plt.gca().set_yticklabels([f'{x:.0%}' for x in plt.gca().get_yticks()])
this assumes
import: from matplotlib import pyplot as plt
Python >=3.6 for f-String formatting. For older versions, replace f'{x:.0%}' with '{:.0%}'.format(x)
I'm late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.
Echoing #Mad Physicist answer, using the package PercentFormatter it would be:
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer
I propose an alternative method using seaborn
Working code:
import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)
You can do this in one line without importing anything:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{}%'.format))
If you want integer percentages, you can do:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{:.0f}%'.format))
You can use either ax.yaxis or plt.gca().yaxis. FuncFormatter is still part of matplotlib.ticker, but you can also do plt.FuncFormatter as a shortcut.
Based on the answer of #erwanp, you can use the formatted string literals of Python 3,
x = '2'
percentage = f'{x}%' # 2%
inside the FuncFormatter() and combined with a lambda expression.
All wrapped:
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: f'{y}%'))
Another one line solution if the yticks are between 0 and 1:
plt.yticks(plt.yticks()[0], ['{:,.0%}'.format(x) for x in plt.yticks()[0]])
add a line of code
ax.yaxis.set_major_formatter(ticker.PercentFormatter())

Define hues by column values in seaborn barplot

I would like the colour of the columns to be determined by their value on the x-axis, e.g. bars with identical values on the x-axis should have identical colours assigned to them.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(index=['A','B','C','D','E','F'],data={'col1':np.array([2.3423,4.435,9.234,9.234,2.456,6.435])})
ax = sns.barplot(x='col1', y=df.index.values, data=df,palette='magma')
This is what it looks like at the moment with default settings. I presume there is a simple elegant way of doing this, but interested in any solution.
Here a solution:
import seaborn as sns
import matplotlib as mpl, matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(index=['A','B','C','D','E','F'],
data={'col1':np.array([2.3423,4.435,9.234,9.234,2.456,6.435])})
ax = sns.barplot(x='col1', y=df.index.values, data=df,
palette=mpl.cm.magma(df['col1']*.1))
Note: mpl.cm.magma is a Colormap instance and is used to convert data values (floats) from the interval [0, 1] to colors that the Colormap represents. If you want "auto scaling" of your data values, you could use palette=mpl.cm.ScalarMappable(cmap='magma').to_rgba(df['col1']) instead in the sns.barplot() call.
Here the output: