I'd like to use matplotlib to display a horizontal histogram similar to the one below:
The code below works fine for vertical histograms:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
plt.hist(df['A'])
plt.show()
The orientation='horizontal' parameter makes the bars horizontal, but clobbers the horizontal scale.
plt.hist(df['A'],orientation='horizontal')
The following works, but feels like a lot of work. Is there a better way?
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.set_xticks([0,5,10])
ax.set_xticklabels([0,5,10])
ax.set_yticks([0,1])
ax.set_yticklabels(['Male','Female'])
df['A'].hist(ax=ax,orientation='horizontal')
fig.tight_layout() # Improves appearance a bit.
plt.show()
plt.hist(df['A']) only works by coincidence. I would recommend not to use plt.hist for non-numeric or categorical plots - it's not meant to be used for that.
Also, it's often a good idea to separate data aggregation from visualization. So, using pandas plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
df["A"].value_counts().plot.barh()
plt.show()
Or using matplotlib plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
counts = df["A"].value_counts()
plt.barh(counts.index, counts)
plt.show()
Related
I have two DataFrames that have time-series data of BTC. I want to display the graphs side by side to analyze them.
display(data_df.plot(figsize=(15,20)))
display(model_df.plot(figsize=(15,20)))
When I plot them like this they stack on top of each-other vertically. I want them side-by-side so they look like this.
enter image description here
Heres one way that might work using subplots (Im guessing you want a total figsize=30x20):
import pylab as plt
fig,(ax0,ax1) = plt.subplots(nrows=1,ncols=2, figsize=(30,20))
data_df.plot(ax=ax0)
model_df.plot(ax=ax1)
You can use matplotlib.pyplot.subplots :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data_df = pd.DataFrame(np.random.randint(0,100,size=(15, 2)), columns=list('AB'))
model_df = pd.DataFrame(np.random.randint(0,100,size=(15, 2)), columns=list('AB'))
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12,4))
for col, ax in zip(data_df, axes):
data_df[col].plot(ax=ax, label=f"data_df ({col})")
model_df[col].plot(ax=ax, label=f"model_df ({col})")
ax.legend()
# Output :
I'm trying to transform the scales on y-axis to the log values. For example, if one of the numbers on y is 0.01, I want to get -2 (which is log(0.01)). How should I do this in matplotlib (or any other library)?!
Thanks,
Without plt.yscale('log') there will be few y-ticks visible that have a nice number as log. You can change the "formatter" to a function that only shows the exponent. Also note that in the latest seaborn version distplot has been replaced by histplot(..., kde=True) or kdeplot(...).
Here is an example:
import matplotlib.pyplot as plt
from matplotlib.ticker import LogFormatterExponent
import numpy as np
import seaborn as sns
x = np.random.randn(10, 1000).cumsum(axis=1).ravel()
ax = sns.histplot(x, kde=True, stat='density', color='purple')
ax.set_yscale('log')
ax.yaxis.set_major_formatter(LogFormatterExponent(base=10.0, labelOnlyBase=True))
ax.set_ylabel(ax.get_ylabel() + ' (exponent)')
ax.margins(x=0)
plt.show()
I would like the colour of the columns to be determined by their value on the x-axis, e.g. bars with identical values on the x-axis should have identical colours assigned to them.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(index=['A','B','C','D','E','F'],data={'col1':np.array([2.3423,4.435,9.234,9.234,2.456,6.435])})
ax = sns.barplot(x='col1', y=df.index.values, data=df,palette='magma')
This is what it looks like at the moment with default settings. I presume there is a simple elegant way of doing this, but interested in any solution.
Here a solution:
import seaborn as sns
import matplotlib as mpl, matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(index=['A','B','C','D','E','F'],
data={'col1':np.array([2.3423,4.435,9.234,9.234,2.456,6.435])})
ax = sns.barplot(x='col1', y=df.index.values, data=df,
palette=mpl.cm.magma(df['col1']*.1))
Note: mpl.cm.magma is a Colormap instance and is used to convert data values (floats) from the interval [0, 1] to colors that the Colormap represents. If you want "auto scaling" of your data values, you could use palette=mpl.cm.ScalarMappable(cmap='magma').to_rgba(df['col1']) instead in the sns.barplot() call.
Here the output:
I am trying to use pandas plotting to create a stacked horizontal barplot with a seaborn import. I would like to remove space between the bars, but also not have the bars overlap. This is what I've tried:
import pandas as pd
import numpy as pd
import seaborn as sns
df = pd.DataFrame(np.random.rand(15, 3))
df.plot.barh(stacked=True, width=1)
This seems to work without importing seaborn, though I like the seaborn style and it is usually an import in the ipython notebook I am working in is this possible?
This artifact is also visible with matplotlib defaults if you set the bar linewidth to what seaborn style has:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(15, 3))
df.plot(stacked=True, width=1, kind="barh", lw=.5)
A solution would be to increase the bar lines back to roughly where the matplotlib defaults are:
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.rand(15, 3))
df.plot(stacked=True, width=1, kind="barh", lw=1)
Perhaps you should reduce the line width?
import seaborn as sns
f, ax = plt.subplots(figsize=(10, 10))
df.plot(kind='barh', stacked=True, width=1, lw=0.1, ax=ax)
Assume we want to plot a time series, e.g.:
import pandas as pd
import numpy as np
a=pd.DatetimeIndex(start='2010-01-01',end='2014-01-01' , freq='D')
b=pd.Series(np.randn(len(a)), index=a)
b.plot()
The result is a figure in which the x-axis has years as labels, I would like to get month-year labels. Is there a fast way to do this (possibly avoiding the use of tens of lines of complex code calling matplotlib)?
Pandas does some really weird stuff to the Axes objects, making it hard to avoid matplotlib calls.
Here's how I would do it
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
a = pd.DatetimeIndex(start='2010-01-01',end='2014-01-01' , freq='D')
b = pd.Series(np.random.randn(len(a)), index=a)
fig, ax = plt.subplots()
ax.plot(b.index, b)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
which give me: