Seaborn how to add number of samples per category in sns.catplot

Seaborn how to add number of samples per category in sns.catplot - pandas

I have a catplot drawing using:
s = sns.catplot(x="level", y="value", hue="cond", kind=graph_type, data=df)
However, the size of the groups is not equal:
"Minimal" has n=12 samples , and "Moderate" has n=18 samples.
How can I add this info to the graph?

Manually calculate the sizes and add them to xticklabels, something like this
import matplotlib.pyplot as plt
import seaborn as sns
exercise = sns.load_dataset("exercise")
cnts = dict(exercise['time'].value_counts())
key = list(cnts.keys())
vals = list(cnts.values())
g = sns.catplot(x="time", y="pulse", hue="kind",order=key,
data=exercise, kind="box")
g.set_axis_labels("", "pulse")
g.set_xticklabels([(key[i]+'\n('+str(vals[i])+')') for i in range(len(key))])
plt.show()

Related

Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Am trying to find hist()'s figsize and layout parameter for sns.pairplot().
I have a pairplot that gives me nice scatterplots between the X's and y. However, it is oriented horizontally and there is no equivalent layout parameter to make them vertical to my knowledge. 4 plots per row would be great.
This is my current sns.pairplot():
sns.pairplot(X_train,
x_vars = X_train.select_dtypes(exclude=['object']).columns,
y_vars = ["SalePrice"])
This is what I would like it to look like: Source
num_mask = train_df.dtypes != object
num_cols = train_df.loc[:, num_mask[num_mask == True].keys()]
num_cols.hist(figsize = (30,15), layout = (4,10))
plt.show()

What you want to achieve isn't currently supported by sns.pairplot, but you can use one of the other figure-level functions (sns.displot, sns.catplot, ...). sns.lmplot creates a grid of scatter plots. For this to work, the dataframe needs to be in "long form".
Here is a simple example. sns.lmplot has parameters to leave out the regression line (fit_reg=False), to set the height of the individual subplots (height=...), to set its aspect ratio (aspect=..., where the subplot width will be height times aspect ratio), and many more. If all y ranges are similar, you can use the default sharey=True.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some test data with different y-ranges
np.random.seed(20230209)
X_train = pd.DataFrame({"".join(np.random.choice([*'uvwxyz'], np.random.randint(3, 8))):
np.random.randn(100).cumsum() + np.random.randint(100, 1000) for _ in range(10)})
X_train['SalePrice'] = np.random.randint(10000, 100000, 100)
# convert the dataframe to long form
# 'SalePrice' will get excluded automatically via `melt`
compare_columns = X_train.select_dtypes(exclude=['object']).columns
long_df = X_train.melt(id_vars='SalePrice', value_vars=compare_columns)
# create a grid of scatter plots
g = sns.lmplot(data=long_df, x='SalePrice', y='value', col='variable', col_wrap=4, sharey=False)
g.set(ylabel='')
plt.show()
Here is another example, with histograms of the mpg dataset:
import matplotlib.pyplot as plt
import seaborn as sns
mpg = sns.load_dataset('mpg')
compare_columns = mpg.select_dtypes(exclude=['object']).columns
mpg_long = mpg.melt(value_vars=compare_columns)
g = sns.displot(data=mpg_long, kde=True, x='value', common_bins=False, col='variable', col_wrap=4, color='crimson',
facet_kws={'sharex': False, 'sharey': False})
g.set(xlabel='')
plt.show()

Barplot per each ax in matplotlib

I have the following dataset, ratings in stars for two fictitious places:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'id':['A','A','A','A','A','A','A','B','B','B','B','B','B'],
'rating':[1,2,4,5,5,5,3,1,3,3,3,5,2]})
Since the rating is a category (is not a continuous data) I convert it to a category:
df['rating_cat'] = pd.Categorical(df['rating'])
What I want is to create a bar plot per each fictitious place ('A or B'), and the count per each rating. This is the intended plot:
I guess using a for per each value in id could work, but I have some trouble to decide the size:
fig, ax = plt.subplots(1,2,figsize=(6,6))
axs = ax.flatten()
cats = df['rating_cat'].cat.categories.tolist()
ids_uniques = df.id.unique()
for i in range(len(ids_uniques)):
ax[i].bar(df[df['id']==ids_uniques[i]], df['rating'].size())
But it returns me an error TypeError: 'int' object is not callable
Perhaps it's something complicated what I am doing, please, could you guide me with this code

The pure matplotlib way:
from math import ceil
# Prepare the data for plotting
df_plot = df.groupby(["id", "rating"]).size()
unique_ids = df_plot.index.get_level_values("id").unique()
# Calculate the grid spec. This will be a n x 2 grid
# to fit one chart by id
ncols = 2
nrows = ceil(len(unique_ids) / ncols)
fig = plt.figure(figsize=(6,6))
for i, id_ in enumerate(unique_ids):
# In a figure grid spanning nrows x ncols, plot into the
# axes at position i + 1
ax = fig.add_subplot(nrows, ncols, i+1)
df_plot.xs(id_).plot(axes=ax, kind="bar")
You can simplify things a lot with Seaborn:
import seaborn as sns
sns.catplot(data=df, x="rating", col="id", col_wrap=2, kind="count")

If you're ok with installing a new library, seaborn has a very helpful countplot. Seaborn uses matplotlib under the hood and makes certain plots easier.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'id':['A','A','A','A','A','A','A','B','B','B','B','B','B'],
'rating':[1,2,4,5,5,5,3,1,3,3,3,5,2]})
sns.countplot(
data = df,
x = 'rating',
hue = 'id',
)
plt.show()
plt.close()

How to plot frequency distribution graph using Matplotlib?

I trust you are doing well. I am using a data frame in which there are two columns screens and it's frequency. I am trying to find out the relationship between the screen and the frequency of the appearance of the screens. Now I want to know, for all screens what are all of the frequencies as sort of a summary graph. Imagine putting all of those frequencies into an array, and wanting to study the distribution in that array. Below is my code that I have tried so far:
data = pd.read_csv('frequency_list.csv')
new_columns = data.columns.values
new_columns[1] = 'frequency'
data.columns = new_columns
import matplotlib.pyplot as plt
%matplotlib inline
dataset = data.head(10)
dataset.plot(x = "screen", y = "frequency", kind = "bar")
plt.show()
col_one_list = unpickled_df['screen'].tolist()
col_one_arr = unpickled_df['screen'].head(10).to_numpy()
plt.hist(col_one_arr) #gives you a histogram of your array 'a'
plt.show() #finishes out the plot
Below is the screenshot of my data frame containing screen as one column and frequency as another. Can you help me to find out a way to plot a frequency distribution graph? Thanks in advance.

Will a bar plot work? Here's an example:
import pandas as pd
import matplotlib.pyplot as plt
freq = [102,98,56,117]
screen = ['A','B','C','D']
df = pd.DataFrame(list(zip(screen, freq)), columns=['screen', 'freq'])
plt.bar(df.screen,df.freq)
plt.xlabel('x')
plt.ylabel('count')
plt.show()

pandas dataframe bar plot put space between bars

So I want my image look like this
But now my image look like this
How do I reduce the space between bars without making the bar width into 1?
Here is my code:
plot=repeat.loc['mean'].plot(kind='bar',rot=0,alpha=1,cmap='Reds',
yerr=repeat.loc['std'],error_kw=dict(elinewitdh=0.02,ecolor='grey'),
align='center',width=0.2,grid=None)
plt.ylabel('')
plt.grid(False)
plt.title(cell,ha='center')
plt.xticks([])
plt.yticks([])
plt.ylim(0,120)
plt.tight_layout()`

make the plot from scratch if the toplevel functions from pandas or seaborn do not give you the desired result! :)
import seaborn.apionly as sns
import scipy as sp
import matplotlib.pyplot as plt
# some fake data
data = sp.randn(10,10) + 1
data = data[sp.argsort(sp.average(data,axis=1))[::-1],:]
avg = sp.average(data,axis=1)
std = sp.std(data,axis=1)
# a practical helper from seaborn to quickly generate the colors
colors = sns.color_palette('Reds',n_colors = data.shape[0])
fig, ax = plt.subplots()
pos = range(10)
ax.bar(pos,avg,width=1)
for col,patch in zip(colors,ax.patches):
patch.set_facecolor(col)
patch.set_edgecolor('k')
for i,p in enumerate(pos):
ax.plot([p,p],[avg[i],avg[i]+std[i]],color='k',lw=2, zorder=-1)

Graphing the sum and average; pandas

total_income = df.groupby('title_year')['gross'].sum()
average_income = df.groupby('title_year')['gross'].mean()
print(plt.semilogy(total_income,average_income))
So I wanted to plot the total and average income on the same graph showing two lines. And I want my x-axis to show the years from 1916-2016 and y-axis to show in Dollars. But my code isn't doing that. I need help on how to change up my code in order to get what I needed
Here's my output of my code.

This is my data file named data.csv:
year,gross
2015,45
2015,47
2015,49
2016,76
2016,78
2016,87
2017,103
2017,115
2017,133
1.) This is all the code to get the log-normal plot:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("data.csv")
total_income = df.groupby('year')['gross'].sum()
average_income = df.groupby('year')['gross'].mean()
total_income.plot(label="Total Income")
average_income.plot(label="Average Income")
plt.xlabel("Year")
plt.ylabel("log$_{10}$(Gross)")
plt.yscale("log")
plt.legend()
plt.tight_layout()
plt.savefig("plot.png")
2.) This is how you use plt.semilogy():
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("data.csv")
total_income = pd.DataFrame(df.groupby('year')['gross'].sum())
average_income = pd.DataFrame(df.groupby('year')['gross'].mean())
plt.semilogy(total_income.index, total_income["gross"],
label="Total Income")
plt.semilogy(average_income.index, average_income["gross"],
label="Average Income")
plt.xlabel("Year")
plt.ylabel("log$_{10}$(Gross)")
plt.legend()
plt.tight_layout()
plt.savefig("plot.png")
1.) and 2.) methods produce the following same plot.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Seaborn how to add number of samples per category in sns.catplot - pandas

I have a catplot drawing using: s = sns.catplot(x="level", y="value", hue="cond", kind=graph_type, data=df) However, the size of the groups is not equal: "Minimal" has n=12 samples , and "Moderate" has n=18 samples. How can I add this info to the graph?

Related

Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Barplot per each ax in matplotlib

How to plot frequency distribution graph using Matplotlib?

pandas dataframe bar plot put space between bars

Graphing the sum and average; pandas

Categories

Resources