Plotting multiple plotly pie chart with different string group - pandas

I have a list of t-shirt orders along with the corresponding size and I would like to plot them in pie chart for each design showing the percentage in which size sells the most etc.
Design Total
0 Boba L 9
1 Boba M 4
2 Boba S 2
3 Boba XL 5
4 Burger L 6
5 Burger M 2
6 Burger S 3
7 Burger XL 1
8 Donut L 5
9 Donut M 9
10 Donut S 2
11 Donut XL 5

It is not complete clear what you asking, but here is my interpretation:
df[['Design', 'Size']] = df['Design'].str.rsplit(n=1, expand=True)
fig, ax = plt.subplots(1, 3, figsize=(10,8))
ax = iter(ax)
for t, g in df.groupby('Design'):
g.set_index('Size')['Total'].plot.pie(ax=next(ax), autopct='%.2f', title=f'{t}')
Maybe you want:
df = pd.read_clipboard() #create data from above text no modification
dfplot = df.loc[df.groupby(df['Design'].str.rsplit(n=1).str[0])['Total'].idxmax(), :]
ax = dfplot.set_index('Design')['Total'].plot.pie(autopct='%.2f')
ax.set_ylabel('');

Let do groupby.plot.pie:
(df.Design.str.split(expand=True)
.assign(Total=df['Total'])
.groupby(0)
.plot.pie(x=1,y='Total', autopct='%.1f%%')
)
# format the plots
for design, ax in s.iteritems():
ax.set_title(design)
one of the output:

Related

Get and graph groupby result distribution of a column

I want to graph my group's distribution of a label column. I was able to do so with creating dummies, crating pivot table of each of the groups, and then create a loop to build a new dataframe.
I am looking for a shorter way. Maybe with more advance methods of groupby?
And also I don't know how to create a side by side bar chart instead of the stack bar chart I have here.
To recreate the dataframe:
import pandas as pd
import numpy as np
np.random.seed(1)
a = np.random.choice(['region_A', 'region_B', 'region_C', 'region_D', 'region_E'], size=30, p=
[0.1, 0.2, 0.3, 0.30, 0.1])
b = np.random.choice(['1', '0'], size=30, p=[0.5, 0.5])
df = pd.DataFrame({'region': a, 'label': b})
My desire graph:
dummy = pd.get_dummies(df['region'])
region_lst = []
label_0 = []
label_1 = []
for col in dummy.columns:
region_lst.append(col)
label_0.append(pd.crosstab(dummy[col], df['label']).iloc[1,0])
label_1.append(pd.crosstab(dummy[col], df['label']).iloc[1,1])
df_labels = pd.DataFrame({'label_0': label_0, 'label_1': label_1}, index=region_lst)
df_labels.plot.bar()
Use crosstab with DataFrame.add_prefix for same ouput like in your long code:
pd.crosstab(df['region'], df['label']).add_prefix('label_').plot.bar()
Details:
df_labels = pd.crosstab(df['region'], df['label']).add_prefix('label_')
print (df_labels)
label label_0 label_1
region
region_A 2 3
region_B 3 3
region_C 5 4
region_D 3 6
region_E 1 0
If need remove texts label and region:
df_labels = (pd.crosstab(df['region'], df['label'])
.add_prefix('label_')
.rename_axis(index=None, columns=None)
print (df_labels)
label_0 label_1
region_A 2 3
region_B 3 3
region_C 5 4
region_D 3 6
region_E 1 0
You can use a crosstab:
pd.crosstab(df['region'], df['label']).plot.bar()
output:
intermediate crosstab:
label 0 1
region
region_A 2 3
region_B 3 3
region_C 5 4
region_D 3 6
region_E 1 0

Scatter plot derived from two pandas dataframes with multiple columns in plotly [express]

I want to create a scatter plot that drives its x values from one dataframe and y values from another dataframe having multiple columns.
x_df :
red blue
0 1 2
1 2 3
2 3 4
y_df:
red blue
0 1 2
1 2 3
2 3 4
I want to plot a scatter plot like
I would like to have two red and blue traces such that x values should come from x_df and y values are derived from y_df.
at some layer you need to do data integration. IMHO better to be done at data layer i.e. pandas
have modified your sample data so two traces do not overlap
used join() assuming that index of data frames is the join key
could have further structured dataframe, however I generated multiple traces using plotly express modifying as required to ensure colors and legends are created
have not considered axis labels...
x_df = pd.read_csv(io.StringIO(""" red blue
0 1 2
1 2 3
2 3 4"""), sep="\s+")
y_df = pd.read_csv(io.StringIO(""" red blue
0 1.1 2.2
1 2.1 3.2
2 3.1 4.2"""), sep="\s+")
df = x_df.join(y_df, lsuffix="_x", rsuffix="_y")
px.scatter(df, x="red_x", y="red_y").update_traces(
marker={"color": "red"}, name="red", showlegend=True
).add_traces(
px.scatter(df, x="blue_x", y="blue_y")
.update_traces(marker={"color": "blue"}, name="blue", showlegend=True)
.data
)

Can't get y-axis on matplotlib histogram to display the right numbers

So I have this simple DataFrame which i am trying to plot a histogram with
Hour Count Average Count
2 6 4 0.129032
4 7 1 0.032258
1 12 9 0.290323
3 16 3 0.096774
0 20 2022 65.225806
What I want is the Hour to be on the x-axis and Average Count to be on the Y axis. But when i tried this:
fig, hour = plt.subplots(1, 1)
hour.hist(test.Hour)
hour.set_xlabel('Time in 24 Hours')
hour.set_ylabel('Frequency')
plt.show()
I got this instead. I have tried doing test.Count and test['Average Count'] but both only affects the x-axis
Are you looking for something like this?
'df' is the name of the dataframe.
df.plot(x='Hour', y = 'Averag Count', kind='bar')
Output

Map some columns of a data frame to other based on two column match pandas [duplicate]

This question already has answers here:
pandas: merged (inner join) data frame has more rows than the original ones
(1 answer)
Pandas Left Outer Join results in table larger than left table
(5 answers)
Closed 3 years ago.
I have two data frame as shown below
df1:
Sector Plot Price Count
A 1 250 2
A 2 100 1
A 3 250 3
df2:
Sector Plot Usage Type
A 1 R Land
A 1 R Land
A 2 C Villa
A 3 R Plot
A 3 R Plot
A 3 R Plot
From the above I would like to add the Usage and Type column from df2 to df1 based on Sector, Plot match.
Expected Output:
Sector Plot Price Count Usage Type
A 1 250 2 R Land
A 2 100 1 C Villa
A 3 250 3 R Plot
I tried below code
df3 = pd.merge(df1, df2, left_on = ['Sector', 'Plot'],
right_on = ['Sector', 'Plot'], how = 'inner')
Add DataFrame.drop_duplicates because duplicates in second DataFrame:
df3 = pd.merge(df1,
df2.drop_duplicates(['Sector', 'Plot']),
on = ['Sector', 'Plot'])
print (df3)
Sector Plot Price Count Usage Type
0 A 1 250 2 R Land
1 A 2 100 1 C Villa
2 A 3 250 3 R Plot

plotting average by each genres; pandas

import numpy as np
df = df.dropna(subset=['genres']).reset_index(drop=True)
splitted = df['genres'].str.split('|')
l = splitted.str.len()
x = df['gross'] / df['budget']
df = pd.DataFrame({x: np.repeat(df[x], l), 'genres':np.concatenate(splitted)})
d = {'mean':'Average Income'}
df1 = df.groupby('genres')[x].agg(['mean']).rename(columns=d)
df1.plot.bar()
plt.yscale("log")
plt.xlabel("Genre")
I want to plot the average of each 'x' for how ever many genres there is[since there are multiple genres for a single movie, I split them into single ones], but I'm not sure what is wrong with my code. It's not doing what I wanted. I need some assistance.
Here's the error message
I think if need aggregate only one function more common is used groupby + mean:
import numpy as np
df = pd.DataFrame({'genres':['Comedy|Crime|Drama|Thriller','Comedy|Crime|Drama',
'Comedy|Crime','Drama|Thriller','Drama','Comedy|Crime'],
'gross':[10,20,30,40,50,60],
'budget':[3,4,5,3,2,5]})
df = df.dropna(subset=['genres']).reset_index(drop=True)
splitted = df['genres'].str.split('|')
l = splitted.str.len()
x = df['gross'] / df['budget']
#is necessary define new column name (divided) and change `df[]` to `x`
df = pd.DataFrame({'divided': np.repeat(x, l), 'genres':np.concatenate(splitted)})
print (df)
divided genres
0 3.333333 Comedy
1 3.333333 Crime
2 3.333333 Drama
3 3.333333 Thriller
4 5.000000 Comedy
5 5.000000 Crime
6 5.000000 Drama
7 6.000000 Comedy
8 6.000000 Crime
9 13.333333 Drama
10 13.333333 Thriller
11 25.000000 Drama
12 12.000000 Comedy
13 12.000000 Crime
#define column for aggregate (divided), no x, because processing new df created by repeat
d = {'mean':'Average Income'}
df1 = df.groupby('genres')['divided'].mean().rename(columns=d).reset_index(name='return')
df1.plot.bar(x='genres', y='return')
plt.yscale("log")
plt.xlabel("Genre")