Pandas DateTimeSlicing for specific months per year - pandas

I was reading a lot of stuff about pandas and date time slicing but I haven't found a solution for my problem yet. I hope you could give me some good advices!
I have a data frame with a Datetimeindex and for example a single column with floats. The time series is about 60 years.
For example:
idx = pd.Series(pd.date_range("2016-11-1", freq="M", periods=48))
dft = pd.DataFrame(np.random.randn(48,1),columns=["NW"], index=idx)
enter image description here
I want to aggregate the column "NW" as sum() per month. I have to solve two problems.
The year begins in November and ends in October.
I have two periods per 12 months to analyse:
a) from November to End of April in the following year and
b) from May to End of October in the same year
For example: "2019-11-1":"2020-4-30" and "2020-05-01":"2020-10-31"
I think I could write a function but I wonder if there is an easier way with methods from pandas to solve this problems.
Do you have any tips?
Best regards Tommi.
Here are some additional informations:
The real datas are daily observations. I want to show a scatter plot for a time series with only the sum() for every month from November-April along the timeline (60 years til now). And the same for the values from May to October.

this is my solution so far. Not the shortest way I think, but it works fine.
d_precipitation_winter = {}
#for each year without the current year
for year in dft.index.year.unique()[:-1]:
#Definition start and end date to mark winter months
start_date = date(year,11,1)
end_date = date(year+1,4,30)
dft_WH = dft.loc[start_date:end_date,:].sum()
d_precipitation_winter[year]=dft_WH
df_precipitation_winter = pd.DataFrame(data=d_precipitation_winter)

Related

How to include end date while calculating time difference in years?(Postgresql)

Please share your feedback on this problem. I need to calculate difference in 'years' and store it under a new column 'Age'.
While the formula works fine, it gives me incorrect output when I consider dates starting from 1st Jan of any year
For example: difference in years between 1st Jan 2019 and 31st Dec 2021 is 3 years - this includes end date in calculation. My result shows 2 years.
Here are the 2 date columns from which I am deriving the difference:
However, when I consider dates from 1st Jan - the result shows me one year less:
Here is the code I used to calculate difference:
UPDATE animals
SET age = abs(benchmarkdate :: date - birthdate :: date)/ 365;
Any help would be appreciated. Thank you.
I would use EXTRACT here and then take a difference of only the year components on the two dates:
UPDATE animals
SET age = EXTRACT(year FROM benchmarkdate) - EXTRACT(year FROM birthdate);
Note that you might even want to avoid doing this update, and instead just compute the age when you select. If you foresee the need to frequently do such an update, that would be a good indicator that you probably should change your approach.

How to use pd.Grouper with an offset?

I have a dataframe df indexed by DateTime, spanning multiple years. Now, I wish to group the data where each group will have 16th as starting date, and 15th of next month as ending date. How do I do this?
I tried df2.groupby(pd.Grouper(freq="MS", offset="16D"), but the offset doesn't seem to have any effect and it still gives me groups starting at the 1st of each month.

How to find the last month data based on current date?

Is there any way based on a date to take the last one month data? I have searched a lot but I can't find a good and precise solution. If the current date is on index 420 the date is 2012-01-09. I want to have a data frame with data from 2011-12-09 until 2012-01-09.
import pandas as pd
import numpy as np
times = pd.DataFrame(pd.date_range('2012-01-01', '2012-04-01', freq='30min'), columns=['date'])
times['date'] = pd.to_datetime(times['date'])
times['value'] = np.random.randint(1, 6, times.shape[0])
months = times.iloc[0:420].sort_values(by='date', ascending=True).set_index('date').last('1M')
Using the .last command the results end on 2012-01-01 as this is the last month. I understand it but is there any way to find the last one month data without using timedelta or relative delta? In the case of both if a date is missing then an error appears which is also a problem.
Thank you.
I think what you're looking for is pd.Period. You can convert all of your datetimes to a month period and then search using that
# turn your datetimes to month periods
times["month"] = times[times["date"].dt.to_period("m")]
# turn your search date to a period
your_date = pd.Period(your_date, "m")
# search times
times[times["month"] == your_date]

Calculate average, max and min based on water year

I can calculate the yearly averages, max and min values based on taking the first day of a year as January First like:
yearly_avg=df2.groupby(years).mean()
yearly_sum=df2.groupby(years).sum()
yearly_MAX=df2.groupby(years).max()
yearly_MIN=df2.groupby(years).min()
I need to calculate averages, max and min numbers based on the water year where October 1st is the first day of a year. As an explanation of "water year": https://en.wikipedia.org/wiki/Water_year
Here is my sample file stored here:
https://drive.google.com/file/d/1AYi9vp3_DPXHoCPB_YkMQp68FvC_INrV/view?usp=sharing
How can I do that?
Thanks.
Just set the year to start on October instead of January. I am just happy your columns were already datetime types, this made it easier!
df = pd.read_excel('sample_water_year.xlsx')
df['# YEAR'] = df.Dates.dt.to_period('A-Sep') #year ends on sep
Note that you will have to subtract 1 as it does not change the initial year value set.
df['# YEAR'] = df['# YEAR'] - 1
Then simply find the summary statistics:
yearly_avg = df.groupby('# YEAR').mean()
yearly_sum = df.groupby('# YEAR').sum()
yearly_MAX = df.groupby('# YEAR').max()
yearly_MIN = df.groupby('# YEAR').min()
Hopefully this helps!

using groupby on pandas dataframe to group by financial year

I have a dataframe with a datetime64 column called DT. Is it possible to use groupby to group by financial year from April 1 to March 31?
For example,
Date | PE_LOW
2010-04-01 | 15.44
...
2011-03-31 | 16.8
2011-04-02 | 17.
...
2012-03-31 | 17.4
For the above data, I want to group by Fiscal Year 2010-2011 and Fiscal Year 2011-2012 without creating an extra column.*
The first thing you want to do is define a function that outputs the financial year as a value. You could use the following.
def getFiscalYear(dt):
year = dt.year
if dt.month<4: year -= 1
return year
You say you don't want to use an extra column to group the frame. Typically the groupby method is called by saying something like this df.groupby("colname") however that statement is semantically equivalent to df.groupby(df["colname"] - meaning you can do something like this...
grouped = DT.groupby(DT['Date'].apply(getFiscalYear))
and then apply a method to the groups or whatever you want to do. If you just want these groups separated call grouped.groups
With pandas.DatetimeIndex, that is very simple:
DT.groupby(pd.DatetimeIndex(DT.Date).shift(-3,freq='m').year)
Or if you use Date as an index of DT, it is even simpler:
DT.groupby(DT.index.shift(-3,freq='m').year)
But beware that shift(-3,freq='m') shifts date to ends of months; for example, 8 Apr to 31 Jan and so on. Anyway, it fits your problem well.
I had a similar problem and used the following to offset the business year end to March (month=3) using Grouper and specifying the frequency:
grouped_df = df.groupby([pd.Grouper(key='DateColumn', freq=pd.tseries.offsets.BYearEnd(month=3))])
Pandas Business Year End and
Grouper
The simplest method I've found for this (similar to Alex's answer, but slightly more concise):
df.groupby([pd.Grouper(key='DateColumn', freq="A-MAR")])
If you want year finishing on the last working day you can use freq="BA-MAR"
Similar to this answer, but I would (at this time of this initial post) need to report that the fiscal year is 2023. This is acheived by reversing the inequality and changing the decrement to an increment.
def fiscal_year(dt):
year = dt.year
if dt.month > 4:
year += 1
return year