How to add month column in dataframe based on dates in data? - pandas

I want to categorize data by month column
e.g.
date Month
2009-05-01==>May
I want to check outcomes by monthly
In this table I am excluding years and only want to keep months.

This is simple when using pd.Series.dt.month_name (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.month_name.html):
import pandas as pd
df = pd.DataFrame({
'date': pd.date_range('2000-01-01', '2010-01-01', freq='1M')
})
df['month'] = df.date.dt.month_name()
df.head()
Output
date month
0 2000-01-31 January
1 2000-02-29 February
2 2000-03-31 March
3 2000-04-30 April
4 2000-05-31 May

Related

Pandas dates - pairing Mondays with other days from the same week

I have a dataframe full of Mondays or Tuesdays as dates and another dataframe full of Mondays, Tuesdays and Wednesdays. I'd like to match each of the dates in the second dataframe with the Monday or Tuesday in the first dataframe of the same week:
import pandas as pd
df1 = pd.DataFrame(['01-25-2022','01-17-2022'])
df2 = pd.DataFrame(['01-26-2022','01-27-2022','01-20-2022'])
So in that example I would like a third dataframe as output which combines df1 and df2:
df3 = pd.DataFrame([['01-25-2022','01-25-2022','01-17-2022'],['01-26-2022','01-27-2022','01-20-2022']]).T
You can get the week (Mon-Sun) by using .dt.to_period('W') (by default .dt.to_period('W-SUN') for Sunday as last week day):
df1 = pd.DataFrame({'A': ['01-25-2022','01-17-2022']},
dtype='datetime64[s]')
df2 = pd.DataFrame({'B': ['01-26-2022','01-27-2022','01-20-2022']},
dtype='datetime64[s]')
df1.merge(df2,
left_on=df1['A'].dt.to_period('W'),
right_on=df2['B'].dt.to_period('W'),
how='right'
).drop(columns='key_0')
output:
A B
0 2022-01-25 2022-01-26
1 2022-01-25 2022-01-27
2 2022-01-17 2022-01-20

Selecting the most recent and the 6th most recent months from a dataframe

I have a dataframe with 24 months of dates. How do I create a new dataframe that only include the most recent month in the dataframe and the 6th/nth most recent month.
You can test for equality of year and month for current date or current date minus 6 months.
df = pd.DataFrame({"Date":pd.date_range(dt.date(2019,9,1), dt.date(2021,9,1), freq="M")})
t = pd.to_datetime("today")
td = t - pd.Timedelta(days=365//2)
mask = (df.Date.dt.year.eq(t.year) & df.Date.dt.month.eq(t.month)) | (df.Date.dt.year.eq(td.year) & df.Date.dt.month.eq(td.month))
df2 = df[mask]
print(df2)
output
Date
11 2020-08-31
17 2021-02-28

Converting daily data to monthly and get months last value in pandas

I have data consists of daily data that belongs to specific month and year like this
I want to convert all daily data to monthly data and I want to get the last value of that month as a return value of that monthly data
for example:
AccoutId, Date, Return
1 2016-01 -4.1999 (Because this return value is last value of january 1/29/16)
1 2016-02 0.19 (Same here last value of february 2/29/16)
and so on
I've looked some of topics about converting daily data to monthly data but the problem is that after converting daily data to monthly data, they take the mean() or sum() of that month as a return value. Conversely, I want the last return value of that month as the return value.
You can groupby AccountId and the Year-Month. Convert to datetime first and then format as Year-Month as follows: df['Date'].dt.strftime('%Y-%m'). Then just use last():
df['Date'] = pd.to_datetime(df['Date'])
df = df.groupby(['AccountId', df['Date'].dt.strftime('%Y-%m')])['Return'].last().reset_index()
df
Sample data:
In[1]:
AccountId Date Return
0 1 1/7/16 15
1 1 1/29/16 10
2 1 2/1/16 25
3 1 2/15/16 20
4 1 2/28/16 30
df['Date'] = pd.to_datetime(df['Date'])
df = df.groupby(['AccountId', df['Date'].dt.strftime('%Y-%m')])['Return'].last().reset_index()
df
Out[1]:
AccountId Date Return
0 1 2016-01 10
1 1 2016-02 30

How to add a yearly amount to daily data in Pandas

I have two DataFrames in pandas. One of them has data every month, the other one has data every year. I need to do some computation where the yearly value is added to the monthly value.
Something like this:
df1, monthly:
2013-01-01 1
2013-02-01 1
...
2014-01-01 1
2014-02-01 1
...
2015-01-01 1
df2, yearly:
2013-01-01 1
2014-01-01 2
2015-01-01 3
And I want to produce something like this:
2013-01-01 (1+1) = 2
2013-02-01 (1+1) = 2
...
2014-01-01 (1+2) = 3
2014-02-01 (1+2) = 3
...
2015-01-01 (1+3) = 4
Where the value of the monthly data is added to the value of the yearly data depending on the year (first value in the parenthesis is the monthly data, second value is the yearly data).
Assuming your "month" column is called date in the Dataframe df, then you can obtain the year by using the dt member:
pd.to_datetime(df.date).dt.year
Add a column like that to your month DataFrame, and call it year. (See this for an explanation).
Now do the same to the year DataFrame.
Do a merge on the month and year DataFrames, specifying how=left.
In the resulting DataFrame, you will have both columns. Now just add them.
Example
month_df = pd.DataFrame({
'date': ['2013-01-01', '2013-02-01', '2014-02-01'],
'amount': [1, 2, 3]})
year_df = pd.DataFrame({
'date': ['2013-01-01', '2014-02-01', '2015-01-01'],
'amount': [7, 8, 9]})
month_df['year'] = pd.to_datetime(month_df.date).dt.year
year_df['year'] = pd.to_datetime(year_df.date).dt.year
>>> pd.merge(
month_df,
year_df,
left_on='year',
right_on='year',
how='left')
amount_x date_x year amount_y date_y
0 1 2013-01-01 2013 7 2013-01-01
1 2 2013-02-01 2013 7 2013-01-01
2 3 2014-02-01 2014 8 2014-02-01

Pandas Dataframe merging columns

I have a pandas dataframe like the following
Year Month Day Securtiy Trade Value NewDate
2011 1 10 AAPL Buy 1500 0
My question is, how can I merge the columns Year, Month, Day into column NewDate
so that the newDate column looks like the following
2011-1-10
The best way is to parse it when reading as csv:
In [1]: df = pd.read_csv('foo.csv', sep='\s+', parse_dates=[['Year', 'Month', 'Day']])
In [2]: df
Out[2]:
Year_Month_Day Securtiy Trade Value NewDate
0 2011-01-10 00:00:00 AAPL Buy 1500 0
You can do this without the header, by defining column names while reading:
pd.read_csv(input_file, header=['Year', 'Month', 'Day', 'Security','Trade', 'Value' ], parse_dates=[['Year', 'Month', 'Day']])
If it's already in your DataFrame, you could use an apply:
In [11]: df['Date'] = df.apply(lambda s: pd.Timestamp('%s-%s-%s' % (s['Year'], s['Month'], s['Day'])), 1)
In [12]: df
Out[12]:
Year Month Day Securtiy Trade Value NewDate Date
0 2011 1 10 AAPL Buy 1500 0 2011-01-10 00:00:00
df['Year'] + '-' + df['Month'] + '-' + df['Date']
You can create a new Timestamp as follows:
df['newDate'] = df.apply(lambda x: pd.Timestamp('{0}-{1}-{2}'
.format(x.Year, x.Month, x.Day),
axix=1)
>>> df
Year Month Day Securtiy Trade Value NewDate newDate
0 2011 1 10 AAPL Buy 1500 0 2011-01-10