Pandas plot multiple category lines - pandas

Say I have the following data...
date Score category
2017-01-01 50.0 1
2017-01-01 590.0 2
2017-01-02 30.0 1
2017-01-02 210.4 2
2017-01-03 11.0 1
2017-01-03 50.3 2
So on a daily basis, I have multiple categories, each being assigned a score.
Here's my code so far...
vals = [{'date': '2017-01-01', 'category': 1, 'Score': 50},
{'date': '2017-01-01', 'category': 2, 'Score': 590},
{'date': '2017-01-02', 'category': 1, 'Score': 30},
{'date': '2017-01-02', 'category': 2, 'Score': 210.4},
{'date': '2017-01-03', 'category': 1, 'Score': 11},
{'date': '2017-01-03', 'category': 2, 'Score': 50.3}]
df = pd.DataFrame(vals)
df.date = pd.to_datetime(df['date'], format='%Y-%m-%d')
df.set_index(['date'],inplace=True)
Which results in a bizarre plot as below.
I'd like to have multiple lines, one for each category, and the date on the X-axis - how would I do this?

You can use groupby and plot
fig, ax = plt.subplots()
for label, grp in df.groupby('category'):
grp.plot(x = grp.index, y = 'Score',ax = ax, label = label)

Let's try using axes with ax=ax and parameter secondary_y=True:
ax = df.plot(x=df.index, y='Score')
df.plot(x=df.index, y='category', secondary_y=True, ax=ax)
Output:
Or if #Vaishali plot is what you want you can do it with this one-liner.
df.set_index('category',append=True).unstack()['Score'].plot()
Output:

Related

Groupby sum and difference of rows in a pandas dataframe

I have a dataframe:
df = pd.DataFrame({
'Metric': ['Total Assets', 'Total Promo', 'Total Assets', 'Total Int'],
'Product': ['AA', 'AA', 'BB', 'AA'],
'Risk': ['High', 'High','Low', 'High'],
'202101': [ 130, 200, 190, 210],
'202102': [ 130, 200, 190, 210],
'202103': [ 130, 200, 190, 210],})
I would like to groupby Product and Risk and sum the entries in Total Assets and Total Promo and subtract the result from the entries in Total Int. I could multiply all rows with Total Int by -1 and sum the result. But I wanted to know if there was a direct way to do so.
df.groupby(['Product', 'Risk']).sum()
The actual dataset is large and it would introduce complexity to multiply certain rows by -1
The output would look like:
df = pd.DataFrame({
'Product': ['AA', 'BB'],
'Risk': ['High', 'Low'],
'202101': [ 120, 190],
'202102': [ 120, 190],
'202103': [ 120, 190],})
You can multiply by -1 your Total Int rows:
df.loc[df['Metric'] == 'Total Int', df.select_dtypes('number').columns] *= -1
# OR
df.loc[df['Metric'] == 'Total Int', df.filter(regex=r'\d{6}').columns] *= -1
>>> df.groupby(['Product', 'Risk']).sum()
202101 202102 202103
Product Risk
AA High 120 120 120
BB Low 190 190 190
In your actual dataset, do you have any groups that only have one row? The following solution will work if all groups have greater than one row, so that diff(), doesn't return nan. This is thy the second row of output is not in there, but I imagine your groups have more than one row in your large dataset.
IIUC, create a series s that differentiates the two groups and take the diff after a groupby of the sum:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Metric': ['Total Assets', 'Total Promo', 'Total Assets', 'Total Int'],
'Product': ['AA', 'AA', 'BB', 'AA'],
'Risk': ['High', 'High','Low', 'High'],
'Col1': [ 130, 200, 190, 210],
'Col2': [ 130, 200, 190, 210],
'Col3': [ 130, 200, 190, 210],})
s = np.where(df['Metric'].isin(['Total Assets', 'Total Promo']), 'B', 'A')
cols = ['Product', 'Risk']
(df.groupby(cols + [s]).sum()
.groupby(cols).diff()
.dropna().reset_index().drop('level_2', axis=1))
Out[1]:
Product Risk Col1 Col2 Col3
0 AA High 120.0 120.0 120.0
How about this as a solution?
(df.
melt(['Metric', 'Product', 'Risk']).
pivot(index=['Product', 'Risk', 'variable'], columns= 'Metric', values = 'value').
assign(Total = lambda df: df['Total Assets'].fillna(0)+df['Total Promo'].fillna(0) - df['Total Int'].fillna(0)).
drop(columns = ['Total Assets', 'Total Promo', 'Total Int']).
reset_index().
pivot(index=['Product', 'Risk'], columns= 'variable', values = 'Total')
)

Groupby and Divide One Group of Rows by Another Group

I have a dataframe:
df = pd.DataFrame({
'Metric': ['Total Assets', 'Total Promo', 'Total Assets', 'Total Promo'],
'Product': ['AA', 'AA', 'BB', 'BB'],
'Risk': ['High', 'High','Low', 'Low'],
'202101': [ 200, 100, 400, 100],
'202102': [ 200, 100, 400, 100],
'202103': [ 200, 100, 400, 100]})
I wish to groupby Product and Risk and divide rows with Total Assets with Total Promo. I would the output to be like this:
df = pd.DataFrame({
'Product': ['AA', 'BB'],
'Risk': ['High', 'Low',],
'202101': [ 2, 4],
'202102': [ 2, 4],
'202103': [ 2, 4]})
So far my approach has been to try and first melt into long form. But I can't seem to get Total Assets and Total Promo to columns to be able to divide columns
df = pd.melt(df, id_vars=['Metric', 'Product', 'Risk'],
value_vars = ["202101", "202102", "202103"],
var_name='Months', value_name='Balance')
Here's one way:
df1 = df.set_index(['Metric', 'Product', 'Risk']).stack().unstack(0)
df = (df1['Total Assets'] / df1['Total Promo']).unstack(-1).reset_index()
OUTPUT:
Product Risk 202101 202102 202103
0 AA High 2.0 2.0 2.0
1 BB Low 4.0 4.0 4.0
Since there are only two rows per grouping and they are ordered, a groupby with the relevant columns, combined with pipe should suffice:
(df.iloc[:, 1:]
.groupby(['Product', 'Risk'])
.pipe(lambda df: df.first()/df.last())
)
202101 202102 202103
Product Risk
AA High 2.0 2.0 2.0
BB Low 4.0 4.0 4.0

How to assigne a new column after groupby in pandas

I want to groupby my data and create a new column assignment.
Given the following data frame
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': ['x1', 'x1', 'x1', 'x2', 'x2', 'x2'], 'col2': [1, 2, 3, 4, 5, 6]})
df['col3']=df[['col1','col2']].groupby('col1').rolling(2).mean().reset_index()
Expected output = pd.DataFrame({'col1': ['x1', 'x1', 'x1', 'x2', 'x2', 'x2'], 'col2': [1, 2, 3, 4, 5, 6], 'col3': [NAN, 1.5, 2.5, NAN, 4.5, 5.5]})
However, this does not work. Is there an straightforward way to do it?
A combination of groupby, apply and assign:
df.groupby('col1', as_index = False).apply(lambda g: g.assign(col3 = g['col2'].rolling(2).mean())).reset_index(drop = True)
output:
col1 col2 col3
0 x1 1 NaN
1 x1 2 1.5
2 x1 3 2.5
3 x2 4 NaN
4 x2 5 4.5
5 x2 6 5.5

How can rows of a pandas DataFrame all be plotted together as lines?

Let's say we have the following DataFrame:
import pandas as pd
df = pd.DataFrame(
[
['Norway' , 'beta', 30.0 , 31.0, 32.0, 32.4, 32.5, 32.1],
['Denmark' , 'beta', 75.7 , 49.1, 51.0, 52.3, 50.0, 47.9],
['Switzerland', 'beta', 46.9 , 44.0, 43.5, 42.3, 41.8, 43.4],
['Finland' , 'beta', 29.00, 29.8, 27.0, 26.0, 25.3, 24.8],
['Netherlands', 'beta', 30.2 , 30.1, 28.5, 28.2, 28.0, 28.0],
],
columns = [
'country',
'run_type',
'score A',
'score B',
'score C',
'score D',
'score E',
'score F'
]
)
df
How could the score values be plotted as lines, where each line corresponds to a country?
Since you tagged matplotlib, here is a solution using plt.plot(). The idea is to plot the lines row wise using iloc
import matplotlib.pyplot as plt
# define DataFrame here
df1 = df.filter(like='score')
for i in range(len(df1)):
plt.plot(df1.iloc[i], label=df['country'][i])
plt.legend()
plt.show()
Try to plot the transpose of the dataframe:
# the score columns, modify if needed
score_cols = df.columns[df.columns.str.contains('score')]
df.set_index('country')[score_cols].T.plot()
Output:

Convert list of dictionary in a dataframe to seperate dataframe

To convert list of dictionary already present in the dataset to a dataframe.
The dataset looks something like this.
[{'id': 35, 'name': 'Comedy'}]
How do I convert this list of dictionary to dataframe?
Thank you for your time!
I want to retrieve:
Comedy
from the list of dictionary.
Use:
df = pd.DataFrame({'col':[[{'id': 35, 'name': 'Comedy'}],[{'id': 35, 'name': 'Western'}]]})
print (df)
col
0 [{'id': 35, 'name': 'Comedy'}]
1 [{'id': 35, 'name': 'Western'}]
df['new'] = df['col'].apply(lambda x: x[0].get('name'))
print (df)
col new
0 [{'id': 35, 'name': 'Comedy'}] Comedy
1 [{'id': 35, 'name': 'Western'}] Western
If possible multiple dicts in list:
df = pd.DataFrame({'col':[[{'id': 35, 'name': 'Comedy'}, {'id':4, 'name':'Horror'}],
[{'id': 35, 'name': 'Western'}]]})
print (df)
col
0 [{'id': 35, 'name': 'Comedy'}, {'id': 4, 'name...
1 [{'id': 35, 'name': 'Western'}]
df['new'] = df['col'].apply(lambda x: [y.get('name') for y in x])
print (df)
col new
0 [{'id': 35, 'name': 'Comedy'}, {'id': 4, 'name... [Comedy, Horror]
1 [{'id': 35, 'name': 'Western'}] [Western]
And if want extract all values:
df1 = pd.concat([pd.DataFrame(x) for x in df['col']], ignore_index=True)
print (df1)
id name
0 35 Comedy
1 4 Horror
2 35 Western