seaborn multiple variables group bar plot - pandas

I have pandas dataframe, one index(datetime) and three variables(int)
date A B C
2017-09-05 25 261 31
2017-09-06 261 1519 151
2017-09-07 188 1545 144
2017-09-08 200 2110 232
2017-09-09 292 2391 325
I can create grouped bar plot with basic pandas plot.
df.plot(kind='bar', legend=False)
However, I want to display in Seaborn or other libraries to improve my skills. I found very close answer(Pandas: how to draw a bar plot with two categories and four series each?).
In its suggested answer, it has code that
ax=sns.barplot(x='', y='', hue='', data=data)
If I apply this code to mine, I do not know what my 'y` would be.
ax=sns.barplot(x='date', y=??, hue=??, data=data)
How can I plot multiple variables with Seaborn or other libraries?

I think need melt if want use barplot:
data = df.melt('date', var_name='a', value_name='b')
print (data)
date a b
0 2017-09-05 A 25
1 2017-09-06 A 261
2 2017-09-07 A 188
3 2017-09-08 A 200
4 2017-09-09 A 292
5 2017-09-05 B 261
6 2017-09-06 B 1519
7 2017-09-07 B 1545
8 2017-09-08 B 2110
9 2017-09-09 B 2391
10 2017-09-05 C 31
11 2017-09-06 C 151
12 2017-09-07 C 144
13 2017-09-08 C 232
14 2017-09-09 C 325
ax=sns.barplot(x='date', y='b', hue='a', data=data)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
Pandas solution with DataFrame.plot.bar and set_index:
df.set_index('date').plot.bar()

Related

Produce time series plots for each name pandas

I have a Pandas data frame df that looks like this:
date A B name
2022-06-25 04:00:00 700 532 aa
2022-06-25 05:00:00 1100 433 aa
2022-06-25 06:00:00 800 754 aa
2022-06-25 07:00:00 1200 288 aa
2022-06-25 08:00:00 700 643 bb
2022-06-25 09:00:00 1400 668 bb
2022-06-25 10:00:00 1600 286 bb
.....
2022-06-03 11:00:00 397 46 cc
2022-06-03 12:00:00 100 7 cc
2022-06-03 13:00:00 400 25 cc
2022-06-03 14:00:00 500 41 cc
2022-06-03 15:00:00 400 0 cc
2022-06-03 16:00:00 300 23 dd
2022-06-03 17:00:00 500 50 dd
2022-06-03 18:00:00 300 0 dd
2022-06-03 19:00:00 400 15 dd
I'm trying to Produce time series plots for each name. Line charts.
number of daily A vs date
number of daily B vs date
I was able to do so and get a plot for each name like this:
df.groupby('name').plot(x='date', figsize=(24,12))
But I couldn't specify a title for each plot like this plt.title(" ")
And also couldn't auto save each plot like this plt.savefig('name.png')
Because they all produce at once.
Is there any other way to produce plots so I can specify the title and save each plot automatically?
Thank you
One of the ways to do what you require is to put the code for your plot creation and plot save inside a for loop. That will allow you to add a title using the title. Also, you will be able to use savefig to individually save each of the plots. The update code and one of the output graphs is shown below. Note that I am adding name as the title and saving the figure as Myplot <name>.png
for name, group in df.groupby('name'):
group.plot(x='date', title=name)
plt.savefig('Myplot {}.png'.format(name), bbox_inches='tight')
First plot

How to extract the last value of a group in pandas dataframe

I have a huge data that needs to be grouped based on its 'IDs' and only the last value of each ID needs to be exported into a SINGLE csv/excel file.
incl = ['A', 'B', 'C']
for k, g in df[df['ID'].isin(incl)].groupby('ID'):
g.tail(1).to_csv(f'{k}.csv')
I have tried this but it makes different csv files for each ID instead of a one big file containing the last value of each group.
Sample data:
ID Date Open High Low
30 UNITY 2020-06-18 11.50 11.75 11.41
31 UNITY 2020-06-21 11.44 11.50 10.88
32 UNITY 2020-06-22 11.26 11.78 11.26
33 UNITY 2020-06-23 11.72 12.08 11.53
34 UNITY 2020-06-24 11.51 11.59 11.40
35 UNITY 2020-06-25 11.85 11.85 11.11
36 SSOM 2020-05-03 27.50 27.95 27.00
37 SSOM 2020-05-05 27.50 27.50 27.50
38 SSOM 2020-05-06 29.20 29.56 29.20
39 SSOM 2020-05-07 31.77 31.77 31.77

Future dates calculating incorrectly in FBProphet - make_future_dataframe method

I'm trying to do a weekly forecast in FBProphet for just 5 weeks ahead. The make_future_dataframe method doesn't seem to be working right....makes the correct one week intervals except for one week between jul 3 and Jul 5....every other interval is correct at 7 days or a week. Code and output below:
INPUT DATAFRAME
ds y
548 2010-01-01 3117
547 2010-01-08 2850
546 2010-01-15 2607
545 2010-01-22 2521
544 2010-01-29 2406
... ... ...
4 2020-06-05 2807
3 2020-06-12 2892
2 2020-06-19 3012
1 2020-06-26 3077
0 2020-07-03 3133
CODE
future = m.make_future_dataframe(periods=5, freq='W')
future.tail(9)
OUTPUT
ds
545 2020-06-12
546 2020-06-19
547 2020-06-26
548 2020-07-03
549 2020-07-05
550 2020-07-12
551 2020-07-19
552 2020-07-26
553 2020-08-02
All you need to do is create a dataframe with the dates you need for predict method. utilizing the make_future_dataframe method is not necessary.

How to merge two dataframe base on dates which the datediff is one day?

Input
df1
id A
2020-01-01 10
2020-02-07 20
2020-04-09 30
df2
id B
2019-12-31 50
2020-02-06 20
2020-02-07 70
2020-04-08 34
2020-04-09 44
Goal
df
id A B
2020-01-01 10 50
2020-02-07 20 20
2020-04-09 30 34
The detail as follows:
df1 merges df2 base on id, which add columns from df2.
the type of id is datetime.
merge rules: df1 based on yesterday
Could you simply add 1 day to df2's ID column before merging?
df1.merge(df2.assign(id=df2['id'] + pd.Timedelta(days=1)), on='id')
id A B
0 2020-01-01 10 50
1 2020-02-07 20 20
2 2020-04-09 30 34
Try pd.merge_asof
df = pd.merge_asof(df1,df2,on='id',tolerance=pd.Timedelta('1 day'),allow_exact_matches=False)
id A B
0 2020-01-01 10 50
1 2020-02-07 20 20
2 2020-04-09 30 34

how do i access only specific entries of a dataframe having date as index

[this is tail of my DataFrame for around 1000 entries][1]
Open Close High Change mx_profitable
Date
2018-06-06 263.00 270.15 271.4 7.15 8.40
2018-06-08 268.95 273.00 273.9 4.05 4.95
2018-06-11 273.30 274.00 278.4 0.70 5.10
2018-06-12 274.00 282.85 284.4 8.85 10.40
I have to sort out the entries of only certain dates, for example, 25th of every month.
I think need DatetimeIndex.day with boolean indexing:
df[df.index.day == 25]
Sample:
rng = pd.date_range('2017-04-03', periods=1000)
df = pd.DataFrame({'a': range(1000)}, index=rng)
print (df.head())
a
2017-04-03 0
2017-04-04 1
2017-04-05 2
2017-04-06 3
2017-04-07 4
df1 = df[df.index.day == 25]
print (df1.head())
a
2017-04-25 22
2017-05-25 52
2017-06-25 83
2017-07-25 113
2017-08-25 144