How to find the last month data based on current date? - pandas

Is there any way based on a date to take the last one month data? I have searched a lot but I can't find a good and precise solution. If the current date is on index 420 the date is 2012-01-09. I want to have a data frame with data from 2011-12-09 until 2012-01-09.
import pandas as pd
import numpy as np
times = pd.DataFrame(pd.date_range('2012-01-01', '2012-04-01', freq='30min'), columns=['date'])
times['date'] = pd.to_datetime(times['date'])
times['value'] = np.random.randint(1, 6, times.shape[0])
months = times.iloc[0:420].sort_values(by='date', ascending=True).set_index('date').last('1M')
Using the .last command the results end on 2012-01-01 as this is the last month. I understand it but is there any way to find the last one month data without using timedelta or relative delta? In the case of both if a date is missing then an error appears which is also a problem.
Thank you.

I think what you're looking for is pd.Period. You can convert all of your datetimes to a month period and then search using that
# turn your datetimes to month periods
times["month"] = times[times["date"].dt.to_period("m")]
# turn your search date to a period
your_date = pd.Period(your_date, "m")
# search times
times[times["month"] == your_date]

Related

Pandas DateTimeSlicing for specific months per year

I was reading a lot of stuff about pandas and date time slicing but I haven't found a solution for my problem yet. I hope you could give me some good advices!
I have a data frame with a Datetimeindex and for example a single column with floats. The time series is about 60 years.
For example:
idx = pd.Series(pd.date_range("2016-11-1", freq="M", periods=48))
dft = pd.DataFrame(np.random.randn(48,1),columns=["NW"], index=idx)
enter image description here
I want to aggregate the column "NW" as sum() per month. I have to solve two problems.
The year begins in November and ends in October.
I have two periods per 12 months to analyse:
a) from November to End of April in the following year and
b) from May to End of October in the same year
For example: "2019-11-1":"2020-4-30" and "2020-05-01":"2020-10-31"
I think I could write a function but I wonder if there is an easier way with methods from pandas to solve this problems.
Do you have any tips?
Best regards Tommi.
Here are some additional informations:
The real datas are daily observations. I want to show a scatter plot for a time series with only the sum() for every month from November-April along the timeline (60 years til now). And the same for the values from May to October.
this is my solution so far. Not the shortest way I think, but it works fine.
d_precipitation_winter = {}
#for each year without the current year
for year in dft.index.year.unique()[:-1]:
#Definition start and end date to mark winter months
start_date = date(year,11,1)
end_date = date(year+1,4,30)
dft_WH = dft.loc[start_date:end_date,:].sum()
d_precipitation_winter[year]=dft_WH
df_precipitation_winter = pd.DataFrame(data=d_precipitation_winter)

Remove hours and extract only month and year

I try to keep only month and year in this df. I tried several solutions but it is not working. Can you help me ?
YOu need to do this (as you post no data, you'll need to adapt this to your case):
from datetime import datetime
datetime_object = datetime.now()
print(datetime_object)
2021-11-30 15:57:20.812209
And to get the year and month do this:
new_date_month= datetime_object.month
print(new_date_month)
new_date_year = datetime_object.year
print(new_date_year)
11
2021
If you need them as new columns in you df:
df['year']=datetime_object.year
df['Month']=datetime_object.month
Note that if your column is not a datetime, this will not work. Given to format of date you hve you will need to do this first:
st = '2021-11-30 15:57:20.812209'
datetime.strptime(st, '%Y-%m-%d %H:%M:%S.%f')

Generate dataframe with timeseries index starting today and fixed interval

I'm trying to generate pandas dataframe with timeseries index with the fixed interval. As an input parameters I need to provide set start and end date. The challenge is that the generated index starts either from month start freq='3MS' or month end with freq='3M'. That cannot be defined in number of days as the whole year needs to have exact 4 periods and the start date needs to be as the defined start date.
The expected output should be in this case:
2020-10-05
2021-01-05
2021-04-05
2021-10-05
Any ideas appreciated.
interpolated = pd.DataFrame( index=pd.date_range('2020-10-05', '2045-10-05', freq='3M'),columns['dummy'])

Filter DataFrame Pandas Datatime

I have a forcast of 24 months in my Dataframe, how can I filter the date to 12 months
I know how to filter by a fixed date.
But my dates are always extended by one month. So I need a variable filter.
My solution should be to filter 12 months from the current month on.
Thanks a lot
Try this:
from datetime import date
from dateutil.relativedelta import relativedelta
df = df[df['Date_column_name'] >= (date.today() + relativedelta(months=+12))]
Hope it helps...

Make a column with todays date and date +1 in a series using pandas

So i would like to make a column name call date
The first entry i would make it today's Date, i.e. 23/07/2019
and the following row to be the date + 1 i.e. 24/07/2019 so on...
This is easily done in Excel but i tried this simple thing in pandas and i just cant figure out how!
I already have a dateframe called df
so to put down todays date is relatively simple.
df.Date = pd.datetime.now().date()
But im not sure which function would get me the date+1 in the following rows.
Thanks
pd.date_range can use 'today' to set the dates. Normalize then create the Series yourself, otherwise pandas thinks the DatetimeIndex should be the Index too.
import pandas as pd
pd.Series(pd.date_range('today', periods=30, freq='D').normalize(),
name='Date')
0 2019-07-23
1 2019-07-24
...
28 2019-08-20
29 2019-08-21
Name: Date, dtype: datetime64[ns]
If adding a new column to the DataFrame:
df['Date'] = pd.date_range('today', periods=len(df), freq='D').normalize()
pd.date_range is what you are looking for. To build a series of 31 days starting from today:
today = pd.Timestamp.now().normalize()
s = pd.date_range(today, today + pd.Timedelta(days=30), freq='D').to_series()