Seaborn boxplot for classification with pandas wide to long [duplicate] - pandas

This question already has answers here:
Boxplot of Multiple Columns of a Pandas Dataframe on the Same Figure (seaborn)
(4 answers)
Plotting grouped barplot using seaborn
(2 answers)
Can't plot time series with seaborn
(2 answers)
Plotting multiple boxplots in seaborn
(2 answers)
Pandas DataFrame.hist Seaborn equivalent
(5 answers)
Closed 9 months ago.
I have data that I would like to train an ml classifier on. The data is in wide format. I'd like to do a boxplot with searborn sns.boxplot(x='variable',y='value', hue='target', data=df_train). How do I reshape the data to be able to pass it to sns.boxplot?
Sample data
import pandas as pd
from sklearn import datasets
X, y = datasets.make_classification(n_samples=100, n_features=5, random_state=1)
df_train = pd.DataFrame(X)
df_train['y']=y

pd.melt is what you want to use.
dfg_train = df_train.melt(id_vars='y')
sns.boxplot(x='variable',y='value', hue='y', data=dfg_train)

Related

Stacked bar chart for a pandas df [duplicate]

This question already has answers here:
Using pandas crosstab to create a bar plot
(2 answers)
count plot with stacked bars per hue [duplicate]
(1 answer)
How to have clusters of stacked bars
(10 answers)
Closed 7 months ago.
I have a df like this and would like to plot stacked bar chart where in the x axis is Component and the y-axis shows the count by 'Major', 'Minor' etc.
Component Priority
0 Browse Groups Minor
1 Notifications Major
2 BI Major
3 BI Minor
4 BI Minor
For example, the first bar would have 1st component with a count of 1 minor,..so on.. and 3rd would have 'BI' in x-axis with 1 count of Major and 2 counts of Minor stacked.
What is the simplest way to do this in seaborn or something similar?
You can groupby both columns and count on Priority, then unstack and plot as stacked bar chart:
df.groupby(['Component', 'Priority']).Priority.count().unstack().plot.bar(stacked=True)
Example:
import pandas as pd
df = pd.DataFrame({'Component': list('abccc'), 'Priority': ['Minor', 'Major', 'Major', 'Minor', 'Minor']})
df.groupby(['Component', 'Priority']).Priority.count().unstack().plot.bar(stacked=True)
As an alternative, you can use a crosstab:
pd.crosstab(df.Component, df.Priority).plot.bar(stacked=True)
If you want to use seaborn (I only now saw the seaborn tag), you can use a displot:
import seaborn as sns
sns.displot(x='Component', hue='Priority', data=df, multiple='stack')

How can I create a bar chart in the image attached to the question? [duplicate]

This question already has answers here:
Create a grouped bar plot using seaborn
(2 answers)
Plot bar chart in multiple subplot rows with Pandas
(1 answer)
Plotting a bar chart with seaborn
(1 answer)
Seaborn Catplot set values over the bars
(3 answers)
How to make multiple plots with seaborn from a wide dataframe
(2 answers)
Closed 8 months ago.
I would like to create a subplot of bar chart where '% of total' is the y-axis and 'plants' is the x-axis. Also 'brand' will be legend, so in this case 3 different charts for the 3 different 'brands'. Each groups % adds up to 100%. I started with the code below, but got stuck. Please see a sample of the data below and image below;
import pandas as pd
import numpy as np
df = pd.DataFrame({
'brand':['A','A', 'A', 'B','B', 'B' ,'C','C', 'C'],
'plants':[0, 1, 2, 0,1,2,0,1,2],
'% of total':[80, 12, 8, 67, 18, 5,35, 40,25],
})
plt.figure(figsize=(10, 10))
for i, brand in enumerate(['A', 'B', 'C']):
You can use seaborn and catplot:
# Python env: pip install seaborn
# Anaconda env: conda install seaborn
import seaborn as sns
import matplotlib.pyplot as plt
sns.catplot(x='plants', y='% of total', col='brand', data=df, kind='bar')
plt.show()
Output:
Does this need to be in a for loop? You could simply grab the relevant rows using pandas.
For example:
my_A_df = df[df['brand'] == A]
plt.hist(my_A_df)
plt.bar(my_A_df['plants'], my_A_df['% of total'])
This will work for generating a barplot for each. Not sure if this is within the bounds of your problem but happy to edit if necessary.

Show values of hbar values in matplotlib [duplicate]

This question already has answers here:
How to plot and annotate grouped bars in seaborn / matplotlib
(1 answer)
How to add value labels on a bar chart
(7 answers)
Closed 1 year ago.
I want to show my values in hbar diagram but somehow I'm not able to render the values right to the bars.
Here is my code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
index = ['average', '50th', '95th', '99.9th', 'max']
aaa = [3180, 2153, 9172, 9368, 9432]
bbb = [3857, 3367, 11638, 14555, 14731]
ccc = [740, 716, 1326, 1927, 2591]
df = pd.DataFrame({'aaa': aaa, 'bbb': bbb, 'ccc': ccc}, index=index)
ax = df.plot.barh()
How can I display my values to each bar and is also possible to save the figure for e.g. .png file?

Pandas time series plot xticklabel [duplicate]

This question already has answers here:
Pandas Dataframe line plot display date on xaxis
(1 answer)
How to get ticks every hour
(2 answers)
datetime dtypes in pandas read_csv
(6 answers)
Closed 4 years ago.
I am trying to plot time series data, and label the x-axis by month.
This is the first few of my dataframe:
Time Value
2012-01-01 00:00:00 1.223
2012-01-01 00:00:30 2.132
2012-01-01 00:01:00 1.417
2012-01-01 00:01:30 1.767
And my code is as follows:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import MonthLocator
%matplotlib inline
df = pd.read_csv('data.csv', index_col=0)
value = df.value
plt.figure(figsize=(15,10))
ax = value.plot(x_compat=True)
ax.xaxis.set_major_locator(MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(plt.NullFormatter())
plt.show()
As suggested in other answers, I already added the (x_compat=True) argument. However, this doesn't give the ticks and ticklabels of my x-axis at all. How can I fix this?
And I also tried the second way with matplotlib by this
plt.figure(figsize=(15,10))
plt.plot(df.index, df.value)
plt.gca().xaxis.set_major_locator(MonthLocator(bymonthday=15))
plt.gca().xaxis.set_major_formatter(DateFormatter("%M"))
But still I can't see the ticklabels in month. It doesn't give me anything on x-axis.
Answer I should convert df.index using to_datatime first.

How to select multi range of rows in pandas dataframe [duplicate]

This question already has answers here:
Python pandas slice dataframe by multiple index ranges
(3 answers)
Slice multiple column ranges with Pandas
(1 answer)
Closed 5 years ago.
Given an example of pandas dataframe with index from 0 to 30. I would like to select the rows within several ranges of index, [0:5], [10:15] and [20:25].
How to do that?
Say you have a random pandas DataFrame with 30 rows and 4 columns as follows:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,30,size=(30, 4)), columns=list('ABCD'))
You can then use np.r_ to index into ranges of rows [0:5], [10:15] and [20:25] as follows:
df.loc[np.r_[0:5, 10:15, 20:25], :]