Seaborn y labels are overlapping - data-visualization

So I tried to make a categorical plot of my data and this is what my code and the graph.
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
sns.set(style="whitegrid")
sns.set_style("ticks")
sns.set_context("paper", font_scale=1, rc={"lines.linewidth": 6})
sns.catplot(y = "Region",x = "Interest by subregion",data = sample)
Image:
How can I make the y-labels more spread out and have a bigger font?

Try using sns.figure(figsize(x,y)) and sns.set_context(context=None,font_scale=1).
Try different values for these parameters to get the best results.

Related

Line plot is filled in Pandas and Matplotlib

I am using pandas and matplotlib to plot a data set.
I am trying to plot two values in a DataFrame a a line plot, but The plot is displaying filled.
but I want this:
my code:
%matplotlib inline
from matplotlib import style
import pandas as pd
from matplotlib import pyplot as plt
covid = pd.read_csv('covid_19_india.csv')
#covid.head()
plt.style.use('Solarize_Light2')
plt.plot(covid.Date, covid.Deaths)
Help needed from community!

Display x-axis values on horizontal matplotlib histogram

I'd like to use matplotlib to display a horizontal histogram similar to the one below:
The code below works fine for vertical histograms:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
plt.hist(df['A'])
plt.show()
The orientation='horizontal' parameter makes the bars horizontal, but clobbers the horizontal scale.
plt.hist(df['A'],orientation='horizontal')
The following works, but feels like a lot of work. Is there a better way?
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.set_xticks([0,5,10])
ax.set_xticklabels([0,5,10])
ax.set_yticks([0,1])
ax.set_yticklabels(['Male','Female'])
df['A'].hist(ax=ax,orientation='horizontal')
fig.tight_layout() # Improves appearance a bit.
plt.show()
plt.hist(df['A']) only works by coincidence. I would recommend not to use plt.hist for non-numeric or categorical plots - it's not meant to be used for that.
Also, it's often a good idea to separate data aggregation from visualization. So, using pandas plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
df["A"].value_counts().plot.barh()
plt.show()
Or using matplotlib plotting,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['Male'] * 10 + ['Female'] * 5})
counts = df["A"].value_counts()
plt.barh(counts.index, counts)
plt.show()

Define hues by column values in seaborn barplot

I would like the colour of the columns to be determined by their value on the x-axis, e.g. bars with identical values on the x-axis should have identical colours assigned to them.
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(index=['A','B','C','D','E','F'],data={'col1':np.array([2.3423,4.435,9.234,9.234,2.456,6.435])})
ax = sns.barplot(x='col1', y=df.index.values, data=df,palette='magma')
This is what it looks like at the moment with default settings. I presume there is a simple elegant way of doing this, but interested in any solution.
Here a solution:
import seaborn as sns
import matplotlib as mpl, matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(index=['A','B','C','D','E','F'],
data={'col1':np.array([2.3423,4.435,9.234,9.234,2.456,6.435])})
ax = sns.barplot(x='col1', y=df.index.values, data=df,
palette=mpl.cm.magma(df['col1']*.1))
Note: mpl.cm.magma is a Colormap instance and is used to convert data values (floats) from the interval [0, 1] to colors that the Colormap represents. If you want "auto scaling" of your data values, you could use palette=mpl.cm.ScalarMappable(cmap='magma').to_rgba(df['col1']) instead in the sns.barplot() call.
Here the output:

prevent overlapping bars using seaborn with pandas plotting

I am trying to use pandas plotting to create a stacked horizontal barplot with a seaborn import. I would like to remove space between the bars, but also not have the bars overlap. This is what I've tried:
import pandas as pd
import numpy as pd
import seaborn as sns
df = pd.DataFrame(np.random.rand(15, 3))
df.plot.barh(stacked=True, width=1)
This seems to work without importing seaborn, though I like the seaborn style and it is usually an import in the ipython notebook I am working in is this possible?
This artifact is also visible with matplotlib defaults if you set the bar linewidth to what seaborn style has:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(15, 3))
df.plot(stacked=True, width=1, kind="barh", lw=.5)
A solution would be to increase the bar lines back to roughly where the matplotlib defaults are:
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.rand(15, 3))
df.plot(stacked=True, width=1, kind="barh", lw=1)
Perhaps you should reduce the line width?
import seaborn as sns
f, ax = plt.subplots(figsize=(10, 10))
df.plot(kind='barh', stacked=True, width=1, lw=0.1, ax=ax)

How to plot a pandas timeseries using months/year resolution (with few lines of code)?

Assume we want to plot a time series, e.g.:
import pandas as pd
import numpy as np
a=pd.DatetimeIndex(start='2010-01-01',end='2014-01-01' , freq='D')
b=pd.Series(np.randn(len(a)), index=a)
b.plot()
The result is a figure in which the x-axis has years as labels, I would like to get month-year labels. Is there a fast way to do this (possibly avoiding the use of tens of lines of complex code calling matplotlib)?
Pandas does some really weird stuff to the Axes objects, making it hard to avoid matplotlib calls.
Here's how I would do it
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
a = pd.DatetimeIndex(start='2010-01-01',end='2014-01-01' , freq='D')
b = pd.Series(np.random.randn(len(a)), index=a)
fig, ax = plt.subplots()
ax.plot(b.index, b)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
which give me: