Define hues by column values in seaborn barplot - matplotlib

I would like the colour of the columns to be determined by their value on the x-axis, e.g. bars with identical values on the x-axis should have identical colours assigned to them.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(index=['A','B','C','D','E','F'],data={'col1':np.array([2.3423,4.435,9.234,9.234,2.456,6.435])})
ax = sns.barplot(x='col1', y=df.index.values, data=df,palette='magma')
This is what it looks like at the moment with default settings. I presume there is a simple elegant way of doing this, but interested in any solution.

Here a solution:
import seaborn as sns
import matplotlib as mpl, matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(index=['A','B','C','D','E','F'],
data={'col1':np.array([2.3423,4.435,9.234,9.234,2.456,6.435])})
ax = sns.barplot(x='col1', y=df.index.values, data=df,
palette=mpl.cm.magma(df['col1']*.1))
Note: mpl.cm.magma is a Colormap instance and is used to convert data values (floats) from the interval [0, 1] to colors that the Colormap represents. If you want "auto scaling" of your data values, you could use palette=mpl.cm.ScalarMappable(cmap='magma').to_rgba(df['col1']) instead in the sns.barplot() call.
Here the output:

Related

transform the values of one axis to its log

I'm trying to transform the scales on y-axis to the log values. For example, if one of the numbers on y is 0.01, I want to get -2 (which is log(0.01)). How should I do this in matplotlib (or any other library)?!
Thanks,
Without plt.yscale('log') there will be few y-ticks visible that have a nice number as log. You can change the "formatter" to a function that only shows the exponent. Also note that in the latest seaborn version distplot has been replaced by histplot(..., kde=True) or kdeplot(...).
Here is an example:
import matplotlib.pyplot as plt
from matplotlib.ticker import LogFormatterExponent
import numpy as np
import seaborn as sns
x = np.random.randn(10, 1000).cumsum(axis=1).ravel()
ax = sns.histplot(x, kde=True, stat='density', color='purple')
ax.set_yscale('log')
ax.yaxis.set_major_formatter(LogFormatterExponent(base=10.0, labelOnlyBase=True))
ax.set_ylabel(ax.get_ylabel() + ' (exponent)')
ax.margins(x=0)
plt.show()

Display x-axis values on horizontal matplotlib histogram

I'd like to use matplotlib to display a horizontal histogram similar to the one below:
The code below works fine for vertical histograms:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
plt.hist(df['A'])
plt.show()
The orientation='horizontal' parameter makes the bars horizontal, but clobbers the horizontal scale.
plt.hist(df['A'],orientation='horizontal')
The following works, but feels like a lot of work. Is there a better way?
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.set_xticks([0,5,10])
ax.set_xticklabels([0,5,10])
ax.set_yticks([0,1])
ax.set_yticklabels(['Male','Female'])
df['A'].hist(ax=ax,orientation='horizontal')
fig.tight_layout() # Improves appearance a bit.
plt.show()
plt.hist(df['A']) only works by coincidence. I would recommend not to use plt.hist for non-numeric or categorical plots - it's not meant to be used for that.
Also, it's often a good idea to separate data aggregation from visualization. So, using pandas plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
df["A"].value_counts().plot.barh()
plt.show()
Or using matplotlib plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
counts = df["A"].value_counts()
plt.barh(counts.index, counts)
plt.show()

Seaborn y labels are overlapping

So I tried to make a categorical plot of my data and this is what my code and the graph.
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
sns.set(style="whitegrid")
sns.set_style("ticks")
sns.set_context("paper", font_scale=1, rc={"lines.linewidth": 6})
sns.catplot(y = "Region",x = "Interest by subregion",data = sample)
Image:
How can I make the y-labels more spread out and have a bigger font?
Try using sns.figure(figsize(x,y)) and sns.set_context(context=None,font_scale=1).
Try different values for these parameters to get the best results.

prevent overlapping bars using seaborn with pandas plotting

I am trying to use pandas plotting to create a stacked horizontal barplot with a seaborn import. I would like to remove space between the bars, but also not have the bars overlap. This is what I've tried:
import pandas as pd
import numpy as pd
import seaborn as sns
df = pd.DataFrame(np.random.rand(15, 3))
df.plot.barh(stacked=True, width=1)
This seems to work without importing seaborn, though I like the seaborn style and it is usually an import in the ipython notebook I am working in is this possible?
This artifact is also visible with matplotlib defaults if you set the bar linewidth to what seaborn style has:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(15, 3))
df.plot(stacked=True, width=1, kind="barh", lw=.5)
A solution would be to increase the bar lines back to roughly where the matplotlib defaults are:
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.rand(15, 3))
df.plot(stacked=True, width=1, kind="barh", lw=1)
Perhaps you should reduce the line width?
import seaborn as sns
f, ax = plt.subplots(figsize=(10, 10))
df.plot(kind='barh', stacked=True, width=1, lw=0.1, ax=ax)

How to plot a pandas timeseries using months/year resolution (with few lines of code)?

Assume we want to plot a time series, e.g.:
import pandas as pd
import numpy as np
a=pd.DatetimeIndex(start='2010-01-01',end='2014-01-01' , freq='D')
b=pd.Series(np.randn(len(a)), index=a)
b.plot()
The result is a figure in which the x-axis has years as labels, I would like to get month-year labels. Is there a fast way to do this (possibly avoiding the use of tens of lines of complex code calling matplotlib)?
Pandas does some really weird stuff to the Axes objects, making it hard to avoid matplotlib calls.
Here's how I would do it
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
a = pd.DatetimeIndex(start='2010-01-01',end='2014-01-01' , freq='D')
b = pd.Series(np.random.randn(len(a)), index=a)
fig, ax = plt.subplots()
ax.plot(b.index, b)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
which give me: