How to visualize 'suicides_no' w.r.t 'gdp_per_capita ($)' for a given country over the years, in the following data frame - pandas

The DataFrame can be viewed here: Global Suicide Dataset
I have made a pivot table with country and year as indices using the following code:
df1 = pd.pivot_table(df, index = ['country', 'year'],
values=['suicides_no','gdp_per_capita ($)', 'population', 'suicides/100k pop'],
aggfunc = {"suicides_no" : np.sum
,"gdp_per_capita ($)" : np.mean
,"population" : np.mean
,"suicides/100k pop" : np.mean})
Output:
Now for my project, i want to visualize how does the suicides_no vary with the gdp_per_capita for a country over the years. But I am unable to plot it. Can somebody please help me out?

First lets convert indexes to columns using df1.reset_index(inplace=True)
Now, you can draw this in a scatter plot where the main features are - Year (preferably on x-axis) and suicides_no (on y-axis). The gdp_per_capita will go as size of the dots.
In this case you have two options:
Draw different plots for each country. (gdp will be shown as hue)
sns.catplot(x='year', y='suicides_no', row='country', hue='gdp_per_capita ($)', data=df1)
Draw everything in a single plot. Scatter plot with GDP as dot size, and Country as Color (hue)
sns.scatterplot(x='year', y='suicides_no', hue='country', size='gdp_per_capita ($)', data=df1)

Related

How to plot lineplot with several lines as countries from dataset

Encountered a problem whilst plotting from GDP dataset:
As I trying to plot, I cannot figure out how to take more than 1 year:
plt.figure(figsize=(14,6))
gdp = sns.lineplot(x=df_gdp['Country Name'], y=df_gdp['1995'], marker='o', color='mediumvioletred', sort=False)
for item in gdp.get_xticklabels():
item.set_rotation(45)
plt.xticks(ha='right',fontweight='light',fontsize='large')
output:
How to plot all years on X, amount on Y and lines as each country ?
How to modify Y stick to shown whole digits, not only 1-2-3-4-5-6 and lell
You need to transform your dataframe to "long form" format, then pass the relevant column names to lineplot
df2 = df.melt(id_vars=['Country Name'], var_name='year', value_name='GDP')
sns.lineplot(x='year', y='GDP', hue='Country Name', data=df2)

How to plot a stacked bar using the groupby data from the dataframe in python?

I am reading huge csv file using pandas module.
filename = pd.read_csv(filepath)
Converted to Dataframe,
df = pd.DataFrame(filename, index=None)
From the csv file, I am concerned with the three columns of name country, year, and value.
I have groupby the country names and sum the values of it as in the following code and plot it as a bar graph.
df.groupby('country').value.sum().plot(kind='bar')
where, x axis is country and y axis is value.
Now, I want to make this bar graph as a stacked bar and used the third column year with different color bars representing each year. Looking forward for an easy way.
Note that, year column contains years from 2000 to 2019.
Thanks.
from what i understand you should try something like :
df.groupby(['country', 'Year']).value.sum().unstack().plot(kind='bar', stacked=True)

Creating a Stacked Bar Chart using Groups (and sum)

This is unique as I want the stacked bar chart to reflect the actual numeric value, not counts or size as per examples
I have a dataframe
## create Data frame
DF = pd.DataFrame({ 'name':
['AA6','B7Y','CCY','AA6','B7Y','CCY','AA6','B7Y','CCY','AA6'],
'measure': [3.2,4.2,6.8,5.6,3.1,4.8,8.8,3.0,1.9,2.1]})
I want to groupby name
#Create a groupby object
gb=DF.groupby(['name', 'measure'])
gb.aggregate('sum')
And then plot a bar chart of the three categories in name(AA6, B7Y & CCY) with each of the corresponding 'measure' values stacked, and in the order they are in (not in ascending order that they appear above)
I have tried this:
DF.groupby('name').plot(kind='bar', stacked=True)
But it just creates separate plots.
If I understand what you are asking:
df = pd.DataFrame({ 'name': ['AA6','B7Y','CCY','AA6','B7Y','CCY','AA6','B7Y','CCY','AA6'],
'measure': [3.2,4.2,6.8,5.6,3.1,4.8,8.8,3.0,1.9,2.1]})
df.groupby(["name", "measure"])["measure"].sum().unstack().plot(kind="bar", stacked=True)
We are using sum to maintain the measure size in the bar plot.
If you don't need the legend, add legend=False to the plot(...).

How to avoid initial data changing when plotting additional data in plot

I want to plot two data series in one plot, but when plotting both data series, one of the series are changing. Matplotlib draws lines between the wrong data.
Firsty_values and secondy_values are lists of timestamps sorted and stretching one 24h interval.
Firstx_values and secondx_values are values in the range 18-21.
The first plot shows the two series together while the last plot shows one of the series alone.
#Firsty_values and secondy_values looks like this:
#['2019-05-04 00:00:03',
# '2019-05-04 00:02:03',
# ...
# '2019-05-04 23:56:03',
# '2019-05-04 23:58:02']
#Firstx_values and secondx_values looks like this:
#[18.32,18.34 ..... 19.32,19.31]
plt.plot(firsty_values,firstx_values,'b')
plt.plot(secondy_values, secondx_values, 'g')
plt.ylabel('Temperature [C]')
plt.xlabel('Time')
plt.legend(['SA1_563_04_RT601A', 'SA1_563_04_RT601B'])
plt.xticks([100,604,1053]) #length more than 1053
plt.show()
#plt.plot(firsty_values,firstx_values,'b')
plt.plot(secondy_values, secondx_values, 'g')
plt.ylabel('Temperature [C]')
plt.xlabel('Time')
plt.legend(['SA1_563_04_RT601A', 'SA1_563_04_RT601B'])
plt.xticks([100,604,1053]) #length less than 1053
plt.show()
Output:
Output with both data series :
Output with one data series :
First plot draw lines between data points that does not lie next to each other. The problem seems to be that some of the data points from the second series are put out of order after the points from the first series. This is reflected by the "xticks" showing three lables when ploting both and two lables when ploting one series.

Visualizing pandas grouped data

Hi I am working on the following dataset
Dataset
df = pd.read_csv('https://github.com/datameet/india-election-data/blob/master/parliament-elections/parliament.csv')
df.groupby(['YEAR','PARTY'])['PC'].nunique()
How do I create a stacked bar plot with year as x axis and pc count as y axis and stacked column labels as party names. Basically I want to display the top 5 parties each year by value, bucket all other parties (excluding IND) as 'others'
Want to visualize something like this Election Viz
IIUC this should work:
sd = df.groupby(['YEAR','PARTY'])['PC'].nunique().reset_index()
sd.pivot(index='YEAR',values='PC',columns='PARTY').plot(kind='bar',stacked=True,figsize=(8,8))