Selecting the most recent and the 6th most recent months from a dataframe - pandas

I have a dataframe with 24 months of dates. How do I create a new dataframe that only include the most recent month in the dataframe and the 6th/nth most recent month.

You can test for equality of year and month for current date or current date minus 6 months.
df = pd.DataFrame({"Date":pd.date_range(dt.date(2019,9,1), dt.date(2021,9,1), freq="M")})
t = pd.to_datetime("today")
td = t - pd.Timedelta(days=365//2)
mask = (df.Date.dt.year.eq(t.year) & df.Date.dt.month.eq(t.month)) | (df.Date.dt.year.eq(td.year) & df.Date.dt.month.eq(td.month))
df2 = df[mask]
print(df2)
output
Date
11 2020-08-31
17 2021-02-28

Related

Subtract dates row by row if the ids are the same

I'd like to subtract dates if the next row's id is the same. I'm able to subtract dates, but stuck on creating the condition to check if the next row has the same id.
d = {'date':['2021-01', '2020-01', '2020-05', '2021-01'], 'id':['a', 'a', 'b', 'b']}
df = pd.DataFrame(data=d)
date id
2021-01 a
2020-01 a
2020-05 b
2021-01 b
My code
df = df.sort_values(by=['id', 'date'])
df['date_diff'] = pd.to_datetime(df['date']) - pd.to_datetime(df['date'].shift())
result
date id date_diff
2020-01 a NaT
2021-01 a 366 days
2020-05 b -245 days
2021-01 b 245 days
Expected result should as below, which the dates only be subtracted when the ids are the same.
Chain with groupby
df['date'] = pd.to_datetime(df['date'])
df['date_diff'] = df.groupby('id')['date'].diff()
df['date']=pd.to_datetime(df['date'])
df['date_diff']=df.groupby('id')['date'].diff()

pandas add datetime column [duplicate]

This question already has answers here:
Pandas: create timestamp from 3 columns: Month, Day, Hour
(2 answers)
How to combine multiple columns in a Data Frame to Pandas datetime format
(3 answers)
Closed 1 year ago.
I have a dataframe where year, month, day, hour is split into separate columns:
df.head()
Year Month Day Hour
0 2020 1 1 0
1 2020 1 1 1
2 2020 1 1 2
...
I'd like to add a proper datetime column to the dataframe so I end up with something along these lines:
df.head()
Year Month Day Hour datetime
0 2020 1 1 0 2020-01-01T00:00
1 2020 1 1 1 2020-01-01T01:00
2 2020 1 1 2 2020-01-01T02:00
...
I could add a loop that processes one row at a time, but that's not panda-esque.
Here are three things that don't work (not that I expected any of them to do so):
df['datetime'] = pd.to_datetime(datetime.datetime(df['Year'], df['Month'], df['Day'], df['Hour']))
df['datetime'] = pd.to_datetime(df['Year'], df['Month'], df['Day'], df['Hour'])
df['datetime'] = pd.datetime(df['Year'], df['Month'], df['Day'], df['Hour'])

Converting daily data to monthly and get months last value in pandas

I have data consists of daily data that belongs to specific month and year like this
I want to convert all daily data to monthly data and I want to get the last value of that month as a return value of that monthly data
for example:
AccoutId, Date, Return
1 2016-01 -4.1999 (Because this return value is last value of january 1/29/16)
1 2016-02 0.19 (Same here last value of february 2/29/16)
and so on
I've looked some of topics about converting daily data to monthly data but the problem is that after converting daily data to monthly data, they take the mean() or sum() of that month as a return value. Conversely, I want the last return value of that month as the return value.
You can groupby AccountId and the Year-Month. Convert to datetime first and then format as Year-Month as follows: df['Date'].dt.strftime('%Y-%m'). Then just use last():
df['Date'] = pd.to_datetime(df['Date'])
df = df.groupby(['AccountId', df['Date'].dt.strftime('%Y-%m')])['Return'].last().reset_index()
df
Sample data:
In[1]:
AccountId Date Return
0 1 1/7/16 15
1 1 1/29/16 10
2 1 2/1/16 25
3 1 2/15/16 20
4 1 2/28/16 30
df['Date'] = pd.to_datetime(df['Date'])
df = df.groupby(['AccountId', df['Date'].dt.strftime('%Y-%m')])['Return'].last().reset_index()
df
Out[1]:
AccountId Date Return
0 1 2016-01 10
1 1 2016-02 30

How to add month column in dataframe based on dates in data?

I want to categorize data by month column
e.g.
date Month
2009-05-01==>May
I want to check outcomes by monthly
In this table I am excluding years and only want to keep months.
This is simple when using pd.Series.dt.month_name (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.month_name.html):
import pandas as pd
df = pd.DataFrame({
'date': pd.date_range('2000-01-01', '2010-01-01', freq='1M')
})
df['month'] = df.date.dt.month_name()
df.head()
Output
date month
0 2000-01-31 January
1 2000-02-29 February
2 2000-03-31 March
3 2000-04-30 April
4 2000-05-31 May

week number from given date in pandas

I have a data frame with two columns Date and value.
I want to add new column named week_number that basically is how many weeks back from the given date
import pandas as pd
df = pd.DataFrame(columns=['Date','value'])
df['Date'] = [ '04-02-2019','03-02-2019','28-01-2019','20-01-2019']
df['value'] = [10,20,30,40]
df
Date value
0 04-02-2019 10
1 03-02-2019 20
2 28-01-2019 30
3 20-01-2019 40
suppose given date is 05-02-2019.
Then I need to add a column week_number in a way such that how many weeks back the Date column date is from given date.
The output should be
Date value week_number
0 04-02-2019 10 1
1 03-02-2019 20 1
2 28-01-2019 30 2
3 20-01-2019 40 3
how can I do this in pandas
First convert column to datetimes by to_datetime with dayfirst=True, then subtract from right side by rsub, convert timedeltas to days, get modulo by 7 and add 1:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['week_number'] = df['Date'].rsub(pd.Timestamp('2019-02-05')).dt.days // 7 + 1
#alternative
#df['week_number'] = (pd.Timestamp('2019-02-05') - df['Date']).dt.days // 7 + 1
print (df)
Date value week_number
0 2019-02-04 10 1
1 2019-02-03 20 1
2 2019-01-28 30 2
3 2019-01-20 40 3