How to chart two different pandas data frames into one chart on matplotlib? - pandas

I have two separate sets of data using pandas:
>>> suicides_sex = suicides_russia.groupby("sex")["suicides_no"].sum()
>>> suicides_sex
sex
female 214330
male 995412
&
>>> suicides_age = suicides_russia.groupby("age")
>>> ["suicides_no"].sum().sort_values()
>>> suicides_age
age
5-14 years 8840
75+ years 74211
15-24 years 148611
25-34 years 231187
55-74 years 267753
35-54 years 479140
I want to learn how to create either a double bar chart using matplotlib or two separate bar charts where I can separate each age group by gender.
How can I combine both sets of data to create either a single bar chart with double columns or two separate bar charts for each gender?

You can use boolean masks to separate the data and then group by age as you did.
import matplotlib.pyplot as plt
suicide_male = suicide_russia.loc[suicide_russia['sex']=='male', :]
# now you basically have the same dataframe but for male only
suicide_male_age = suicides_male.groupby("age")["suicides_no"].sum()
plt.bar(height=suicide_male_age.values, x=np.arange(suicide_male_age.index))
plt.xticks(labels=suicide_male_age.index)
plt.show()
Then you can repeat the same for female. That is probably not the most efficient way of doing it, but it works.
Also, I assumed the 'age' column values are strings, so I put np.arange as x positions of the bars and the values themselves as xticks.
Hope it helps!

Related

Making multiple pie charts out of a pandas dataframe (one for each column)

My question is similar to Making multiple pie charts out of a pandas dataframe (one for each row).
However, instead of each row, I am looking for each column in my case.
I can make pie chart for each column, however, as I have 12 columns the pie charts are too much close to each other.
I have used this code:
fig, axes = plt.subplots(4, 3, figsize=(10, 6))
for i, (idx, row) in enumerate(df.iterrows()):
ax = axes[i // 3, i % 3]
row = row[row.gt(row.sum() * .01)]
ax.pie(row, labels=row.index, startangle=30)
ax.set_title(idx)
fig.subplots_adjust(wspace=.2)
and I have the following result
But I want is on the other side. I need to have 12 pie charts (becuase I have 12 columns) and each pie chart should have 4 sections (which are leg, car, walk, and bike)
and if I write this code
fig, axes = plt.subplots(4,3)
for i, col in enumerate(df.columns):
ax = axes[i // 3, i % 3]
plt.plot(df[col])
then I have the following results:
and if I use :
plot = df.plot.pie(subplots=True, figsize=(17, 8),labels=['pt','car','walk','bike'])
then I have the following results:
Which is quite what I am looking for. but it is not possible to read the pie charts. if it can produce in more clear output, then it is better.
As in your linked post I would use matplotlib.pyplot for this. The accepted answer uses plt.subplots(2, 3) and I would suggest doing the same for creating two rows with each 3 plots in them.
Like this:
fig, axes = plt.subplots(2,3)
for i, col in enumerate(df.columns):
ax = axes[i // 3, i % 3]
ax.plot(df[col])
Finally, I understood that if I swap rows and columns
df_sw = df.T
Then I can use the code in the examples:
Making multiple pie charts out of a pandas dataframe (one for each row)

Stacked bars with hue in seaborn and pandas [duplicate]

This question already has answers here:
How to have clusters of stacked bars
(10 answers)
Closed 2 years ago.
I have a dataframe that looks like this:
df = pd.DataFrame(columns=["type", "App","Feature1", "Feature2","Feature3",
"Feature4","Feature5",
"Feature6","Feature7","Feature8"],
data=[["type1", "SHA",0,0,1,5,1,0,1,0],
["type2", "LHA",1,0,1,1,0,1,1,0],
["type2", "FRA",1,0,2,1,1,0,1,1],
["type1", "BRU",0,0,1,0,3,0,0,0],
["type2", "PAR",0,1,1,4,1,0,1,0],
["type2", "AER",0,0,1,1,0,1,1,0],
["type1", "SHE",0,0,0,1,0,0,1,0]])
I want to make a stacked bar with type as a hue. This is, in the x axis I want the features, and for each feature I want 2 stacked bars, one for type1 and one for type2.
For instance, here they explain how to make a stacked bar plot with seaborn when the column type is dropped. Instead, I want for each feature two stacked bars. Note: the values of App are shared for type1 and type2
For instance, if I just plot the stacked bars corresponding to type1, I get this:
I want to make a stacked bar plot where for each feature there are two stacked bars, one for type1, and the other one for type2
I don't think seaborn has a function for barplots that are both stacked and grouped. But you can do it in matplotlib itself by hand. Here is an example.
I think what you are looking for is the melt Function
d = df.drop(columns='App')
d = d.melt('type', var_name='a', value_name='b')
sns.barplot(x='a', y='b', data=d, hue='type')

How to plot a stacked bar using the groupby data from the dataframe in python?

I am reading huge csv file using pandas module.
filename = pd.read_csv(filepath)
Converted to Dataframe,
df = pd.DataFrame(filename, index=None)
From the csv file, I am concerned with the three columns of name country, year, and value.
I have groupby the country names and sum the values of it as in the following code and plot it as a bar graph.
df.groupby('country').value.sum().plot(kind='bar')
where, x axis is country and y axis is value.
Now, I want to make this bar graph as a stacked bar and used the third column year with different color bars representing each year. Looking forward for an easy way.
Note that, year column contains years from 2000 to 2019.
Thanks.
from what i understand you should try something like :
df.groupby(['country', 'Year']).value.sum().unstack().plot(kind='bar', stacked=True)

Can you make 4 bars with 4 columns of data using matplotlib?

I need to make a bar graph where x-axis plot the index ([1992,1993,1994,1995]) and y-axis will plot 4 columns of random data as shown below. It will generate 4 bars for 4 columns of data. How can I do that? I am pretty new to matplotlib. Please help. This is the DataFrame:
np.random.seed(12345)
df = pd.DataFrame([np.random.normal(32000,200000,3650), np.random.normal(43000,100000,3650), np.random.normal(43500,140000,3650),np.random.normal(48000,70000,3650)], index=[1992,1993,1994,1995])
df.head()

Overlaying actual data on a boxplot from a pandas dataframe

I am using Seaborn to make boxplots from pandas dataframes. Seaborn boxplots seem to essentially read the dataframes the same way as the pandas boxplot functionality (so I hope the solution is the same for both -- but I can just use the dataframe.boxplot function as well). My dataframe has 12 columns and the following code generates a single plot with one boxplot for each column (just like the dataframe.boxplot() function would).
fig, ax = plt.subplots()
sns.set_style("darkgrid", {"axes.facecolor":"darkgrey"})
pal = sns.color_palette("husl",12)
sns.boxplot(dataframe, color = pal)
Can anyone suggest a simple way of overlaying all the values (by columns) while making a boxplot from dataframes?
I will appreciate any help with this.
This hasn't been added to the seaborn.boxplot function yet, but there's something similar in the seaborn.violinplot function, which has other advantages:
x = np.random.randn(30, 6)
sns.violinplot(x, inner="points")
sns.despine(trim=True)
A general solution for the boxplot for the entire dataframe, which should work for both seaborn and pandas as their are all matplotlib based under the hood, I will use pandas plot as the example, assuming import matplotlib.pyplot as plt already in place. As you have already have the ax, it would make better sense to just use ax.text(...) instead of plt.text(...).
In [35]:
print df
V1 V2 V3 V4 V5
0 0.895739 0.850580 0.307908 0.917853 0.047017
1 0.931968 0.284934 0.335696 0.153758 0.898149
2 0.405657 0.472525 0.958116 0.859716 0.067340
3 0.843003 0.224331 0.301219 0.000170 0.229840
4 0.634489 0.905062 0.857495 0.246697 0.983037
5 0.573692 0.951600 0.023633 0.292816 0.243963
[6 rows x 5 columns]
In [34]:
df.boxplot()
for x, y, s in zip(np.repeat(np.arange(df.shape[1])+1, df.shape[0]),
df.values.ravel(), df.values.astype('|S5').ravel()):
plt.text(x,y,s,ha='center',va='center')
For a single series in the dataframe, a few small changes is necessary:
In [35]:
sub_df=df.V1
pd.DataFrame(sub_df).boxplot()
for x, y, s in zip(np.repeat(1, df.shape[0]),
sub_df.ravel(), sub_df.values.astype('|S5').ravel()):
plt.text(x,y,s,ha='center',va='center')
Making scatter plots is also similar:
#for the whole thing
df.boxplot()
plt.scatter(np.repeat(np.arange(df.shape[1])+1, df.shape[0]), df.values.ravel(), marker='+', alpha=0.5)
#for just one column
sub_df=df.V1
pd.DataFrame(sub_df).boxplot()
plt.scatter(np.repeat(1, df.shape[0]), sub_df.ravel(), marker='+', alpha=0.5)
To overlay stuff on boxplot, we need to first guess where each boxes are plotted at among xaxis. They appears to be at 1,2,3,4,..... Therefore, for the values in the first column, we want them to be plot at x=1; the 2nd column at x=2 and so on.
Any efficient way of doing it is to use np.repeat, repeat 1,2,3,4..., each for n times, where n is the number of observations. Then we can make a plot, using those numbers as x coordinates. Since it is one-dimensional, for the y coordinates, we will need a flatten view of the data, provided by df.ravel()
For overlaying the text strings, we need a anther step (a loop). As we can only plot one x value, one y value and one text string at a time.
I have the following trick:
data = np.random.randn(6,5)
df = pd.DataFrame(data,columns = list('ABCDE'))
Now assign a dummy column to df:
df['Group'] = 'A'
print df
A B C D E Group
0 0.590600 0.226287 1.552091 -1.722084 0.459262 A
1 0.369391 -0.037151 0.136172 -0.772484 1.143328 A
2 1.147314 -0.883715 -0.444182 -1.294227 1.503786 A
3 -0.721351 0.358747 0.323395 0.165267 -1.412939 A
4 -1.757362 -0.271141 0.881554 1.229962 2.526487 A
5 -0.006882 1.503691 0.587047 0.142334 0.516781 A
Use the df.groupby.boxplot(), you get it done.
df.groupby('Group').boxplot()