how do i access only specific entries of a dataframe having date as index - pandas

[this is tail of my DataFrame for around 1000 entries][1]
Open Close High Change mx_profitable
Date
2018-06-06 263.00 270.15 271.4 7.15 8.40
2018-06-08 268.95 273.00 273.9 4.05 4.95
2018-06-11 273.30 274.00 278.4 0.70 5.10
2018-06-12 274.00 282.85 284.4 8.85 10.40
I have to sort out the entries of only certain dates, for example, 25th of every month.

I think need DatetimeIndex.day with boolean indexing:
df[df.index.day == 25]
Sample:
rng = pd.date_range('2017-04-03', periods=1000)
df = pd.DataFrame({'a': range(1000)}, index=rng)
print (df.head())
a
2017-04-03 0
2017-04-04 1
2017-04-05 2
2017-04-06 3
2017-04-07 4
df1 = df[df.index.day == 25]
print (df1.head())
a
2017-04-25 22
2017-05-25 52
2017-06-25 83
2017-07-25 113
2017-08-25 144

Related

How to find 1st of months between start and end dates and add them to a dataframe?

I have a dataset like this
import pandas as pd
df = pd.DataFrame(
{
"id": {0: 1, 1: 1, 2: 1, 3: 2, 4: 2},
"price": {0: 20, 1: 41, 2: 61, 3: 68, 4: 10},
"date_month_start": {
0: "2021-06-12",
1: "2021-11-13",
2: "2022-02-27",
3: "2021-04-14",
4: "2021-07-11",
},
"date_month_end": {
0: "2021-09-14",
1: "2022-01-13",
2: "2022-04-12",
3: "2021-06-18",
4: "2021-10-16",
},
}
)
print(df)
id price date_month_start date_month_end
0 1 20 2021-06-12 2021-09-14
1 1 41 2021-11-13 2022-01-13
2 1 61 2022-02-27 2022-04-12
3 2 68 2021-04-14 2021-06-18
4 2 10 2021-07-11 2021-10-16
But I would like to create a column for first of month that falls between start and end date and repeat rows (except first of month date) if there are more than one first of months falls between start and end date.
For instance if the start date is March 12, 2021 and end date is June 04, 2021, than I would like to have a new column April 1st 2021, May 1st 2021, June 1st 2021. As we have three values for the new column so, I would like to repeat rows by copying other column values except the new one.
The output data should look like:
id price date_month_start date_month_end date_month
0 1 20 2021-06-12 2021-09-14 2021-07-01
1 1 20 2021-06-12 2021-09-14 2022-08-01
2 1 20 2021-06-12 2021-09-14 2022-09-01
3 1 41 2021-11-13 2022-01-13 2021-12-01
4 1 41 2021-11-13 2022-01-13 2022-01-01
5 1 61 2022-02-27 2022-04-12 2022-03-01
6 1 61 2022-02-27 2022-04-12 2022-04-01
7 2 68 2021-04-14 2021-06-18 2021-05-01
8 2 68 2021-04-14 2021-06-18 2021-06-01
9 2 10 2021-07-11 2021-10-16 2021-08-01
10 2 10 2021-07-11 2021-10-16 2021-09-01
11 2 10 2021-07-11 2021-10-16 2021-10-01
I am new in python, anyone has any direction how to do it? I can get first day of month from date column, but it is a whole different thing.
Here is one way to do it:
from pandas.tseries.offsets import MonthEnd
# Convert into Pandas datetimes
df['date_month_start'] = pd.to_datetime(df['date_month_start'])
df['date_month_end'] = pd.to_datetime(df['date_month_end'])
# For each row of 'df', find month starts between start and end date
# Duplicate the row and add new column
# Store new intermediate dataframe in list (dfs)
dfs = []
for i in range(df.shape[0]):
temp_df = df.loc[i, :]
new_month = pd.Series(
[
temp_df["date_month_start"] + MonthEnd(i) + pd.Timedelta(1, "d")
for i in range(1, 13)
if temp_df["date_month_start"] + MonthEnd(i) + pd.Timedelta(1, "d")
< temp_df["date_month_end"]
]
)
temp_df = pd.DataFrame([temp_df.to_list() for _ in range(len(new_month))])
temp_df[4] = new_month
dfs.append(temp_df)
# Concat intermediate dataframes into one
new_df = dfs[0]
for df in dfs[1:]:
new_df = pd.concat([new_df, df])
# Cleanup
new_df.columns = ["id", "price", "date_month_start", "date_month_end", "date_month"]
new_df = new_df.reset_index(drop=True)
print(new_df)
# Output
id price date_month_start date_month_end date_month
0 1 20 2021-06-12 2021-09-14 2021-07-01
1 1 20 2021-06-12 2021-09-14 2021-08-01
2 1 20 2021-06-12 2021-09-14 2021-09-01
3 1 41 2021-11-13 2022-01-13 2021-12-01
4 1 41 2021-11-13 2022-01-13 2022-01-01
5 1 61 2022-02-27 2022-04-12 2022-03-01
6 1 61 2022-02-27 2022-04-12 2022-04-01
7 2 68 2021-04-14 2021-06-18 2021-05-01
8 2 68 2021-04-14 2021-06-18 2021-06-01
9 2 10 2021-07-11 2021-10-16 2021-08-01
10 2 10 2021-07-11 2021-10-16 2021-09-01
11 2 10 2021-07-11 2021-10-16 2021-10-01

Resample 10D but until end of months

I would like to resample a DataFrame with frequences of 10D but cutting the last decade always at the end of the month.
ES:
print(df)
 data
index
2010-01-01 145.08
2010-01-02 143.69
2010-01-03 101.06
2010-01-04 57.63
2010-01-05 65.46
...
2010-02-24 48.06
2010-02-25 87.41
2010-02-26 71.97
2010-02-27 73.1
2010-02-28 41.43
Apply something like df.resample('10DM').mean()
data
index
2010-01-10 97.33
2010-01-20 58.58
2010-01-31 41.43
2010-02-10 35.17
2010-02-20 32.44
2010-02-28 55.44
note that the 1st and 2nd decades are normal 10D resample, but the 3rd can be 8-9-10-11 days based on month and year.
Thanks in advance.
Sample data (easy to check):
# df = pd.DataFrame({"value": np.arange(1, len(dti)+1)}, index=dti)
>>> df
value
2010-01-01 1
2010-01-02 2
2010-01-03 3
2010-01-04 4
2010-01-05 5
...
2010-02-24 55
2010-02-25 56
2010-02-26 57
2010-02-27 58
2010-02-28 59
You need to create groups by (days, month, year):
grp = df.groupby([pd.cut(df.index.day, [0, 10, 20, 31]),
pd.Grouper(freq='M'),
pd.Grouper(freq='Y')])
Now you can compute the mean for each group:
out = grp['value'].apply(lambda x: (x.index.max(), x.mean())).apply(pd.Series) \
.reset_index(drop=True).rename(columns={0:'date', 1:'value'}) \
.set_index('date').sort_index()
Output result:
>>> out
value
date
2010-01-10 5.5
2010-01-20 15.5
2010-01-31 26.0
2010-02-10 36.5
2010-02-20 46.5
2010-02-28 55.5

Is there a way of group by month in Pandas starting at specific day number?

I'm trying to group by month some data in python, but i need the month to start at the 25 of each month, is there a way to do that in Pandas?
For weeks there is a way of starting on Monday, Tuesday, ... But for months it's always full month.
pd.Grouper(key='date', freq='M')
You could offset the dates by 24 days and groupby:
np.random.seed(1)
dates = pd.date_range('2019-01-01', '2019-04-30', freq='D')
df = pd.DataFrame({'date':dates,
'val': np.random.uniform(0,1,len(dates))})
# for groupby
s = df['date'].sub(pd.DateOffset(24))
(df.groupby([s.dt.year, s.dt.month], as_index=False)
.agg({'date':'min', 'val':'sum'})
)
gives
date val
0 2019-01-01 10.120368
1 2019-01-25 14.895363
2 2019-02-25 14.544506
3 2019-03-25 17.228734
4 2019-04-25 3.334160
Another example:
np.random.seed(1)
dates = pd.date_range('2019-01-20', '2019-01-30', freq='D')
df = pd.DataFrame({'date':dates,
'val': np.random.uniform(0,1,len(dates))})
s = df['date'].sub(pd.DateOffset(24))
df['groups'] = df.groupby([s.dt.year, s.dt.month]).cumcount()
gives
date val groups
0 2019-01-20 0.417022 0
1 2019-01-21 0.720324 1
2 2019-01-22 0.000114 2
3 2019-01-23 0.302333 3
4 2019-01-24 0.146756 4
5 2019-01-25 0.092339 0
6 2019-01-26 0.186260 1
7 2019-01-27 0.345561 2
8 2019-01-28 0.396767 3
9 2019-01-29 0.538817 4
10 2019-01-30 0.419195 5
And you can see the how the cumcount restarts at day 25.
I prepared the following test DataFrame:
Dat Val
0 2017-03-24 0
1 2017-03-25 0
2 2017-03-26 1
3 2017-03-27 0
4 2017-04-24 0
5 2017-04-25 0
6 2017-05-24 0
7 2017-05-25 2
8 2017-05-26 0
The first step is to compute a "shifted date" column:
df['Dat2'] = df.Dat + pd.DateOffset(days=-24)
The result is:
Dat Val Dat2
0 2017-03-24 0 2017-02-28
1 2017-03-25 0 2017-03-01
2 2017-03-26 1 2017-03-02
3 2017-03-27 0 2017-03-03
4 2017-04-24 0 2017-03-31
5 2017-04-25 0 2017-04-01
6 2017-05-24 0 2017-04-30
7 2017-05-25 2 2017-05-01
8 2017-05-26 0 2017-05-02
As you can see, March dates in Dat2 start just from original date 2017-03-25,
and so on.
The value of 1 is in March (Dat2) and the value of 2 is in May (also Dat2).
Then, to compute e.g. a sum by month, we can run:
df.groupby(pd.Grouper(key='Dat2', freq='MS')).sum()
getting:
Val
Dat2
2017-02-01 0
2017-03-01 1
2017-04-01 0
2017-05-01 2
So we have correct groupping:
1 is in March,
2 is in May.
The advantage over the other answer is that you have all dates on the first
day of a month, of course bearing in mind that e.g. 2017-03-01 in the
result means the period from 2017-03-25 to 2017-04-24 (including).

Date Time Values Rearragement

Hi I have a data frame df with the following columns and so on the values are for the whole month.
Timestamp Count
0 2017-10-01 00:00:00 783
1 2017-10-01 01:00:00 662
2 2017-10-01 02:00:00 075
3 2017-10-01 03:00:00 272
4 2017-10-01 04:00:00 381
I want to put the values in the order of row wise per hour basis
Output required
Hour1 Hour2 Hour3 ......... Hour24
Day1 783 662 075 .........
Day2 ...................................
Use pandas.pivot by Timestamp converted to date and hour, last add add_prefix and rename_axis:
df = (pd.pivot(df['Timestamp'].dt.date,
df['Timestamp'].dt.hour + 1,
df['Count'])
.add_prefix('Hour')
.rename_axis(None)
.rename_axis(None, axis=1))
Alternative is set_index with unstack:
df = (df.set_index([df['Timestamp'].dt.date, df['Timestamp'].dt.hour + 1])['Count']
.unstack()
.add_prefix('Hour')
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
Hour1 Hour2 Hour3 Hour4 Hour5
2017-10-01 783 662 75 272 381

Pandas - Group into 24-hour blocks, but not midnight-to-midnight

I have a time Series. I'd like to group into into blocks of 24-hour blocks, from 8am to 7:59am the next day. I know how to group by date, but I've tried and failed to handle this 8-hour offset using TimeGroupers and DateOffsets.
I think you can use Grouper with parameter base:
print df
date name
0 2015-06-13 00:21:25 1
1 2015-06-14 01:00:25 2
2 2015-06-14 02:54:48 3
3 2015-06-15 14:38:15 2
4 2015-06-15 15:29:28 1
print df.groupby(pd.Grouper(key='date', freq='24h', base=8)).sum()
name
date
2015-06-12 08:00:00 1.0
2015-06-13 08:00:00 5.0
2015-06-14 08:00:00 NaN
2015-06-15 08:00:00 3.0
alternatively to #jezrael's method you can use your custom grouper function:
start_ts = '2016-01-01 07:59:59'
df = pd.DataFrame({'Date': pd.date_range(start_ts, freq='10min', periods=1000)})
def my_grouper(df, idx):
return df.ix[idx, 'Date'].date() if df.ix[idx, 'Date'].hour >= 8 else df.ix[idx, 'Date'].date() - pd.Timedelta('1day')
df.groupby(lambda x: my_grouper(df, x)).size()
Test:
In [468]: df.head()
Out[468]:
Date
0 2016-01-01 07:59:59
1 2016-01-01 08:09:59
2 2016-01-01 08:19:59
3 2016-01-01 08:29:59
4 2016-01-01 08:39:59
In [469]: df.tail()
Out[469]:
Date
995 2016-01-08 05:49:59
996 2016-01-08 05:59:59
997 2016-01-08 06:09:59
998 2016-01-08 06:19:59
999 2016-01-08 06:29:59
In [470]: df.groupby(lambda x: my_grouper(df, x)).size()
Out[470]:
2015-12-31 1
2016-01-01 144
2016-01-02 144
2016-01-03 144
2016-01-04 144
2016-01-05 144
2016-01-06 144
2016-01-07 135
dtype: int64