Remove hours and extract only month and year - pandas

I try to keep only month and year in this df. I tried several solutions but it is not working. Can you help me ?

YOu need to do this (as you post no data, you'll need to adapt this to your case):
from datetime import datetime
datetime_object = datetime.now()
print(datetime_object)
2021-11-30 15:57:20.812209
And to get the year and month do this:
new_date_month= datetime_object.month
print(new_date_month)
new_date_year = datetime_object.year
print(new_date_year)
11
2021
If you need them as new columns in you df:
df['year']=datetime_object.year
df['Month']=datetime_object.month
Note that if your column is not a datetime, this will not work. Given to format of date you hve you will need to do this first:
st = '2021-11-30 15:57:20.812209'
datetime.strptime(st, '%Y-%m-%d %H:%M:%S.%f')

Related

Pandas DateTimeSlicing for specific months per year

I was reading a lot of stuff about pandas and date time slicing but I haven't found a solution for my problem yet. I hope you could give me some good advices!
I have a data frame with a Datetimeindex and for example a single column with floats. The time series is about 60 years.
For example:
idx = pd.Series(pd.date_range("2016-11-1", freq="M", periods=48))
dft = pd.DataFrame(np.random.randn(48,1),columns=["NW"], index=idx)
enter image description here
I want to aggregate the column "NW" as sum() per month. I have to solve two problems.
The year begins in November and ends in October.
I have two periods per 12 months to analyse:
a) from November to End of April in the following year and
b) from May to End of October in the same year
For example: "2019-11-1":"2020-4-30" and "2020-05-01":"2020-10-31"
I think I could write a function but I wonder if there is an easier way with methods from pandas to solve this problems.
Do you have any tips?
Best regards Tommi.
Here are some additional informations:
The real datas are daily observations. I want to show a scatter plot for a time series with only the sum() for every month from November-April along the timeline (60 years til now). And the same for the values from May to October.
this is my solution so far. Not the shortest way I think, but it works fine.
d_precipitation_winter = {}
#for each year without the current year
for year in dft.index.year.unique()[:-1]:
#Definition start and end date to mark winter months
start_date = date(year,11,1)
end_date = date(year+1,4,30)
dft_WH = dft.loc[start_date:end_date,:].sum()
d_precipitation_winter[year]=dft_WH
df_precipitation_winter = pd.DataFrame(data=d_precipitation_winter)

Filter DataFrame Pandas Datatime

I have a forcast of 24 months in my Dataframe, how can I filter the date to 12 months
I know how to filter by a fixed date.
But my dates are always extended by one month. So I need a variable filter.
My solution should be to filter 12 months from the current month on.
Thanks a lot
Try this:
from datetime import date
from dateutil.relativedelta import relativedelta
df = df[df['Date_column_name'] >= (date.today() + relativedelta(months=+12))]
Hope it helps...

How to find the last month data based on current date?

Is there any way based on a date to take the last one month data? I have searched a lot but I can't find a good and precise solution. If the current date is on index 420 the date is 2012-01-09. I want to have a data frame with data from 2011-12-09 until 2012-01-09.
import pandas as pd
import numpy as np
times = pd.DataFrame(pd.date_range('2012-01-01', '2012-04-01', freq='30min'), columns=['date'])
times['date'] = pd.to_datetime(times['date'])
times['value'] = np.random.randint(1, 6, times.shape[0])
months = times.iloc[0:420].sort_values(by='date', ascending=True).set_index('date').last('1M')
Using the .last command the results end on 2012-01-01 as this is the last month. I understand it but is there any way to find the last one month data without using timedelta or relative delta? In the case of both if a date is missing then an error appears which is also a problem.
Thank you.
I think what you're looking for is pd.Period. You can convert all of your datetimes to a month period and then search using that
# turn your datetimes to month periods
times["month"] = times[times["date"].dt.to_period("m")]
# turn your search date to a period
your_date = pd.Period(your_date, "m")
# search times
times[times["month"] == your_date]

Make a column with todays date and date +1 in a series using pandas

So i would like to make a column name call date
The first entry i would make it today's Date, i.e. 23/07/2019
and the following row to be the date + 1 i.e. 24/07/2019 so on...
This is easily done in Excel but i tried this simple thing in pandas and i just cant figure out how!
I already have a dateframe called df
so to put down todays date is relatively simple.
df.Date = pd.datetime.now().date()
But im not sure which function would get me the date+1 in the following rows.
Thanks
pd.date_range can use 'today' to set the dates. Normalize then create the Series yourself, otherwise pandas thinks the DatetimeIndex should be the Index too.
import pandas as pd
pd.Series(pd.date_range('today', periods=30, freq='D').normalize(),
name='Date')
0 2019-07-23
1 2019-07-24
...
28 2019-08-20
29 2019-08-21
Name: Date, dtype: datetime64[ns]
If adding a new column to the DataFrame:
df['Date'] = pd.date_range('today', periods=len(df), freq='D').normalize()
pd.date_range is what you are looking for. To build a series of 31 days starting from today:
today = pd.Timestamp.now().normalize()
s = pd.date_range(today, today + pd.Timedelta(days=30), freq='D').to_series()

Days between a date and a generic birthday in Pandas metaframe

I have in a dataframe:
Customer ID,
Customer date of birth,
date of purchase.
I need a function to calculate the distance in +/- days between the date of birth and the date of purchase
for example if date of birth is 20/12/1960 and date of purchase is 16/01/2019 I need to have 27 which is 27 days after the birthday or if the date of purchase is 05/12/2018 I need -15 which is 15 days before the birthday.
Any suggestions?
Since you need to stay within the year of purchase, you need to extract the day of the year for the birthday and the purchase date which can be done using .dt.dayofyear as follows:
import pandas as pd
import numpy as np
df=pd.DataFrame({'customer_id':[1,2,3],
'birthday':pd.to_datetime(['20/12/1960','2/6/1980','6/1/1972']),
'purchase_date':pd.to_datetime(['1/1/2004','5/25/2018','3/4/2010'])})
df['days_away']=df['birthday'].dt.dayofyear - df['purchase_date'].dt.dayofyear
df
You can separate this into two. First, create a new column for the looked up birthday date. Second, subtract these two date columns to get the timedelata (and do .dt.days to get this in days).