How to plot timeseries bar chart with multiple values per stack/ timestamp from pandas dataframe - pandas

value cumsum price
0 2021-02-01 00:00:00 164.6136 164.6136 0.0216
2021-02-01 00:00:00 163.8085 328.4221 0.0215
2021-02-01 00:00:00 163.0466 491.4687 0.0214
2021-02-01 00:00:00 14999.9925 15491.4612 0.0213
1 2021-02-01 00:00:10 164.6136 164.6136 0.0216
... ... ... ...
8634 2021-02-01 23:59:00 14999.9993 14999.9993 0.0221
8635 2021-02-01 23:59:10 14999.9993 14999.9993 0.0221
8636 2021-02-01 23:59:20 14999.9993 14999.9993 0.0221
8637 2021-02-01 23:59:30 0.0000 0.0000 0.0221
2021-02-01 23:59:30 14999.9993 14999.9993 0.0221
My data looks like the above, and I would like to plot a graph like below
Can somebody please help me?

You can use the below code to plot the graph.
If the code helps you, accept it as answer and upvote it pls.
import matplotlib.pyplot as plt
import pandas as pd
# create data
df = pd.DataFrame([['A', 10, 20, 10, 26],['A', 10, 40, 10, 26],['A', 10, 70, 10, 26], ['B', 20, 25, 15, 21],['B', 20, 45, 15, 21], ['C', 12, 15, 19, 6],
['D', 10, 18, 11, 19]],
columns=['Team', 'Round 1', 'Round 2', 'Round 3', 'Round 4'])
# view data
df
# # plot data in stack manner of bar type
df.pivot("Team", "Round 2").plot(kind='bar',stacked=True).legend().set_visible(False)
plt.show()
df
This is what you get as output. Do you expect this?

Related

How to plot my data using MatPloitLib with step size

Consider the following code and the graph obtained from it
import matplotlib.pyplot as plt
import numpy as np
fig,axs = plt.subplots(figsize=(10,10))
data1 = [5, 6, 18, 7, 19]
x_ax = [10, 20, 30, 40, 50]
y_ax = [0, 5, 10, 15, 20]
axs.plot(data1,marker="o")
axs.set_xticks(x_ax)
axs.set_xticklabels(labels=x_ax,rotation=45)
axs.set_yticks(y_ax)
axs.set_yticklabels(labels=y_ax,rotation=45)
axs.set_xlabel("X")
axs.set_ylabel("Y")
axs.set_title("Name")
I need to plot my data1 = [5, 6, 18, 7, 19] with a step size of 10. 5 for 10, 6 for 20, 18 for 30, 7 for 40 and 19 for 50. But the plot is taking a step size of one.
How can I modify my code to do the required?
If you don't provide x values to plot, it'll automatically use 0, 1, 2 ....
So in your case you need:
x = range(10, len(data1)*10+1, 10)
axs.plot(x, data1, marker="o")

Transform and reshape a Data Frame from wide to long with additional column

I have a data frame that I want to transform from wide into a long format. But I do not want to use all columns.
In detail, I want to melt the following data frame
import pandas as pd
data = {'year': [2014, 2018,2020,2017],
'model':[12, 14,21,8],
'amount': [100, 120,80,210],
'quality': ["low", "high","medium","high"]
}
# pass column names in the columns parameter
df = pd.DataFrame.from_dict(data)
print(df)
into this data frame:
data2 = {'year': [2014, 2014, 2018, 2018, 2020, 2020, 2017, 2017],
'variable': ["model", "amount", "model", "amount", "model", "amount", "model", "amount"],
'value':[12, 100, 14, 120, 21, 80, 8, 210],
'quality': ["low", "low", "high", "high", "medium", "medium", "high", "high"]
}
# pass column names in the columns parameter
df2 = pd.DataFrame.from_dict(data2)
print(df2)
I tried pd.melt() with different combinations of the input parameters, and it works somehow if I do not take the quality colum into consideration. But according to the result, I can not skip the quality column. Furthermore, I tried df.pivot(), df.pivot_table(), and pd.wide_to_long(). All in several combinations. But somehow, I do not get the desired result. Maybe pushing the columns year and quality into the data frame index would help, before performing any pd.melt() operations?
Thank you very much for your help in advance!
import pandas as pd
data = {'year': [2014, 2018,2020,2017],
'model':[12, 14,21,8],
'amount': [100, 120,80,210],
'quality': ["low", "high","medium","high"]
}
# pass column names in the columns parameter
df = pd.DataFrame.from_dict(data)
print(df)
data2 = {'year': [2014, 2014, 2018, 2018, 2020, 2020, 2017, 2017],
'variable': ["model", "amount", "model", "amount", "model", "amount", "model", "amount"],
'value':[12, 100, 14, 120, 21, 80, 8, 210],
'quality': ["low", "low", "high", "high", "medium", "medium", "high", "high"]
}
# pass column names in the columns parameter
df2 = pd.DataFrame.from_dict(data2)
print(df2)
df3 = pd.melt(df, id_vars=['year', 'quality'], var_name='variable', value_name='value')
df3 = df3[['year', 'variable', 'value', 'quality']]
df3.sort_values('year', inplace=True)
print(df3)
Output (for df3):
year variable value quality
0 2014 model 12 low
4 2014 amount 100 low
3 2017 model 8 high
7 2017 amount 210 high
1 2018 model 14 high
5 2018 amount 120 high
2 2020 model 21 medium
6 2020 amount 80 medium

Fill column in df.A based on comparison values in df.A and df.B

So I have this code:
import pandas as pd
import numpy as np
frame1 = {'Season': ['S19', 'S20', 'S21',
'S19', 'S20', 'S21',
'S19', 'S20', 'S21'],
'DateFrom': ['2019-01-01', '2020-01-01', '2021-01-01',
'2019-01-01', '2020-01-01', '2021-01-01',
'2019-01-01', '2020-01-01', '2021-01-01'],
'DateTo': ['2019-12-30', '2020-12-30', '2021-12-30',
'2019-12-30', '2020-12-30', '2021-12-30',
'2019-12-30', '2020-12-30', '2021-12-30'],
'Currency': ['EUR', 'EUR', 'EUR',
'USD', 'USD', 'USD',
'MAD', 'MAD', 'MAD'],
'Rate': [1, 2, 3, 4, 5, 6, 7, 8, 9]
}
df1 = pd.DataFrame(data=frame1)
frame2 = {'Room': ['Double', 'Single', 'SeaView'],
'Season': ['S20', 'S20', 'S19'],
'DateFrom': ['2020-05-01', '2020-07-05', '2019-03-25'],
'Currency': ['EUR', 'MAD', 'USD'],
'Rate': [0, 0, 0]
}
df2 = pd.DataFrame(data=frame2)
df1[['DateFrom', 'DateTo']] = df1[['DateFrom', 'DateTo']].apply(pd.to_datetime)
df2[['DateFrom']] = df2[['DateFrom']].apply(pd.to_datetime)
print(df1.dtypes)
print(df2.dtypes)
df2['Rate'] = np.where((
df2['Season'] == df1['Season'] &
df2['Currency'] == df1['Currency'] &
(df2['DateFrom'] > df1['DateFrom'] & df2['DateFrom'] < df1['DateTo'])
), df1['Rates'], 'MissingData')
print(df2)
What I am trying to achieve is to fill Rate values in df2 with Rate values from df1 based on conditions where:
df2.Season == df1.Season &
df2.Currency == df1.Currency &
df2.DateFrom must be between df1.DateFrom and df1.DateTo
So my result in 'Rates' should be 2,8,4
I was hoping that code above will work but its not, i am getting error:
"TypeError: unsupported operand type(s) for &: 'str' and 'str'"
Any help how to make it work will be appreciated.
You can first merge then compare:
out = df1.merge(df2[['Season','Currency','DateFrom']],on=['Season','Currency'],
suffixes=('','_y'))
out = (out[out['DateFrom_y'].between(out['DateFrom'],out['DateTo'])]
.reindex(columns=df1.columns).copy())
print(out)
Season DateFrom DateTo Currency Rate
0 S20 2020-01-01 2020-12-30 EUR 2
1 S19 2019-01-01 2019-12-30 USD 4
2 S20 2020-01-01 2020-12-30 MAD 8
EDIT per comments:
out = df1.merge(df2,on=['Season','Currency'],suffixes=('','_y'))
out = (out[out['DateFrom_y'].between(out['DateFrom'],out['DateTo'])]
.reindex(columns=df2.columns).copy())
Room Season DateFrom Currency Rate
0 Double S20 2020-01-01 EUR 2
1 SeaView S19 2019-01-01 USD 4
2 Single S20 2020-01-01 MAD 8

Timeseries: Groupby and calculate variance

I have the following dataframe with timeseries data:
df = pd.DataFrame(columns = ['id', 'value'])
df['value'] =[9, 16, 10, 12, 11, 14]
df['id'] = [1, 1, 1, 2, 2, 2]
For each timeseries (defined by column 'id' I want to calculate the variance to find timeseries that do not change at all or only very little.
The final dataframe should look like this:
df_end = pd.DataFrame(columns = ['id','value', 'var'])
df_end['value'] =[9, 16, 10, 12, 11, 14]
df_end['id'] = [1, 1, 1, 2, 2, 2]
df_end['var'] = [21, 21, 21, 2.3, 2.3, 2.3]
I tried:
df.groupby(df['id']).var()
which gives me the values, but I couldn't put it into the df in the right form. I am sure, there is a handy function for this that I don't know about yet!
Thanks for helping out!
Use GroupBy.transform with specify column value:
df['var'] = df.groupby('id')['value'].transform('var')
print (df)
id value var
0 1 9 14.333333
1 1 16 14.333333
2 1 10 14.333333
3 2 12 2.333333
4 2 11 2.333333
5 2 14 2.333333

Pandas plot multiple category lines

Say I have the following data...
date Score category
2017-01-01 50.0 1
2017-01-01 590.0 2
2017-01-02 30.0 1
2017-01-02 210.4 2
2017-01-03 11.0 1
2017-01-03 50.3 2
So on a daily basis, I have multiple categories, each being assigned a score.
Here's my code so far...
vals = [{'date': '2017-01-01', 'category': 1, 'Score': 50},
{'date': '2017-01-01', 'category': 2, 'Score': 590},
{'date': '2017-01-02', 'category': 1, 'Score': 30},
{'date': '2017-01-02', 'category': 2, 'Score': 210.4},
{'date': '2017-01-03', 'category': 1, 'Score': 11},
{'date': '2017-01-03', 'category': 2, 'Score': 50.3}]
df = pd.DataFrame(vals)
df.date = pd.to_datetime(df['date'], format='%Y-%m-%d')
df.set_index(['date'],inplace=True)
Which results in a bizarre plot as below.
I'd like to have multiple lines, one for each category, and the date on the X-axis - how would I do this?
You can use groupby and plot
fig, ax = plt.subplots()
for label, grp in df.groupby('category'):
grp.plot(x = grp.index, y = 'Score',ax = ax, label = label)
Let's try using axes with ax=ax and parameter secondary_y=True:
ax = df.plot(x=df.index, y='Score')
df.plot(x=df.index, y='category', secondary_y=True, ax=ax)
Output:
Or if #Vaishali plot is what you want you can do it with this one-liner.
df.set_index('category',append=True).unstack()['Score'].plot()
Output: