How to plot a stacked bar using the groupby data from the dataframe in python? - pandas

I am reading huge csv file using pandas module.
filename = pd.read_csv(filepath)
Converted to Dataframe,
df = pd.DataFrame(filename, index=None)
From the csv file, I am concerned with the three columns of name country, year, and value.
I have groupby the country names and sum the values of it as in the following code and plot it as a bar graph.
df.groupby('country').value.sum().plot(kind='bar')
where, x axis is country and y axis is value.
Now, I want to make this bar graph as a stacked bar and used the third column year with different color bars representing each year. Looking forward for an easy way.
Note that, year column contains years from 2000 to 2019.
Thanks.

from what i understand you should try something like :
df.groupby(['country', 'Year']).value.sum().unstack().plot(kind='bar', stacked=True)

Related

How to visualize single column from pandas dataframe

I'm new to data science & pandas. I'm just trying to visualize the distribution of data from a single series (a single column), but the histogram that I'm generating is only a single column (see below where it's sorted descending).
My data is over 11 million rows. The max value is 27,235 and the min values are 1. I'd like to see the "count" column grouped into different bins and a column/bar whose height is the total for each bin. But, I'm only seeing a single bar and am not sure what to do.
Data
df = pd.DataFrame({'count':[27235,26000,25877]})
Solution
import matplotlib.pyplot as plt
df['count'].hist()
Alternatively
sns.distplot(df['count'])

How to visualize 'suicides_no' w.r.t 'gdp_per_capita ($)' for a given country over the years, in the following data frame

The DataFrame can be viewed here: Global Suicide Dataset
I have made a pivot table with country and year as indices using the following code:
df1 = pd.pivot_table(df, index = ['country', 'year'],
values=['suicides_no','gdp_per_capita ($)', 'population', 'suicides/100k pop'],
aggfunc = {"suicides_no" : np.sum
,"gdp_per_capita ($)" : np.mean
,"population" : np.mean
,"suicides/100k pop" : np.mean})
Output:
Now for my project, i want to visualize how does the suicides_no vary with the gdp_per_capita for a country over the years. But I am unable to plot it. Can somebody please help me out?
First lets convert indexes to columns using df1.reset_index(inplace=True)
Now, you can draw this in a scatter plot where the main features are - Year (preferably on x-axis) and suicides_no (on y-axis). The gdp_per_capita will go as size of the dots.
In this case you have two options:
Draw different plots for each country. (gdp will be shown as hue)
sns.catplot(x='year', y='suicides_no', row='country', hue='gdp_per_capita ($)', data=df1)
Draw everything in a single plot. Scatter plot with GDP as dot size, and Country as Color (hue)
sns.scatterplot(x='year', y='suicides_no', hue='country', size='gdp_per_capita ($)', data=df1)

How to chart two different pandas data frames into one chart on matplotlib?

I have two separate sets of data using pandas:
>>> suicides_sex = suicides_russia.groupby("sex")["suicides_no"].sum()
>>> suicides_sex
sex
female 214330
male 995412
&
>>> suicides_age = suicides_russia.groupby("age")
>>> ["suicides_no"].sum().sort_values()
>>> suicides_age
age
5-14 years 8840
75+ years 74211
15-24 years 148611
25-34 years 231187
55-74 years 267753
35-54 years 479140
I want to learn how to create either a double bar chart using matplotlib or two separate bar charts where I can separate each age group by gender.
How can I combine both sets of data to create either a single bar chart with double columns or two separate bar charts for each gender?
You can use boolean masks to separate the data and then group by age as you did.
import matplotlib.pyplot as plt
suicide_male = suicide_russia.loc[suicide_russia['sex']=='male', :]
# now you basically have the same dataframe but for male only
suicide_male_age = suicides_male.groupby("age")["suicides_no"].sum()
plt.bar(height=suicide_male_age.values, x=np.arange(suicide_male_age.index))
plt.xticks(labels=suicide_male_age.index)
plt.show()
Then you can repeat the same for female. That is probably not the most efficient way of doing it, but it works.
Also, I assumed the 'age' column values are strings, so I put np.arange as x positions of the bars and the values themselves as xticks.
Hope it helps!

How can I draw Yearly series using monthly data from a DateTimeIndex in Matplotlib?

I have monthly data of 6 variables from 2014 until 2018 in one dataset.
I'm trying to draw 6 subplots (one for each variable) with monthly X axis (Jan, Feb....) and 5 series (one for each year) with their legend.
This is part of the data:
I created 5 series (one for each year) per variable (30 in total) and I'm getting the expected output but using MANY lines of code.
What is the best way to achieve this using less lines of code?
This is an example how I created the series:
CL2014 = data_total['Charity Lottery'].where(data_total['Date'].dt.year == 2014)[0:12]
CL2015 = data_total['Charity Lottery'].where(data_total['Date'].dt.year == 2015)[12:24]
This is an example of how I'm plotting the series:
axCL.plot(xvals, CL2014)
axCL.plot(xvals, CL2015)
axCL.plot(xvals, CL2016)
axCL.plot(xvals, CL2017)
axCL.plot(xvals, CL2018)
There's no need to litter your namespace with 30 variables. Seaborn makes the job very easy but you need to normalize your dataframe first. This is what "normalized" or "unpivoted" looks like (Seaborn calls this "long form"):
Date variable value
2014-01-01 Charity Lottery ...
2014-01-01 Racecourse ...
2014-04-01 Bingo Halls ...
2014-04-01 Casino ...
Your screenshot is a "pivoted" or "wide form" dataframe.
df_plot = pd.melt(df, id_vars='Date')
df_plot['Year'] = df_plot['Date'].dt.year
df_plot['Month'] = df_plot['Date'].dt.strftime('%b')
import seaborn as sns
plot = sns.catplot(data=df_plot, x='Month', y='value',
row='Year', col='variable', kind='bar',
sharex=False)
plot.savefig('figure.png', dpi=300)
Result (all numbers are randomly generated):
I would try using .groupby(), it is really powerful for parsing down things like this:
for _, group in data_total.groupby([year, month])[[x_variable, y_variable]]:
plt.plot(group[x_variables], group[y_variables])
So here the groupby will separate your data_total DataFrame into year/month subsets, with the [[]] on the end to parse it down to the x_variable (assuming it is in your data_total DataFrame) and your y_variable, which you can make any of those features you are interested in.
I would decompose your datetime column into separate year and month columns, then use those new columns inside that groupby as the [year, month]. You might be able to pass in the dt.year and dt.month like you had before... not sure, try it both ways!

Visualizing pandas grouped data

Hi I am working on the following dataset
Dataset
df = pd.read_csv('https://github.com/datameet/india-election-data/blob/master/parliament-elections/parliament.csv')
df.groupby(['YEAR','PARTY'])['PC'].nunique()
How do I create a stacked bar plot with year as x axis and pc count as y axis and stacked column labels as party names. Basically I want to display the top 5 parties each year by value, bucket all other parties (excluding IND) as 'others'
Want to visualize something like this Election Viz
IIUC this should work:
sd = df.groupby(['YEAR','PARTY'])['PC'].nunique().reset_index()
sd.pivot(index='YEAR',values='PC',columns='PARTY').plot(kind='bar',stacked=True,figsize=(8,8))