using groupby on pandas dataframe to group by financial year - pandas

I have a dataframe with a datetime64 column called DT. Is it possible to use groupby to group by financial year from April 1 to March 31?
For example,
Date | PE_LOW
2010-04-01 | 15.44
...
2011-03-31 | 16.8
2011-04-02 | 17.
...
2012-03-31 | 17.4
For the above data, I want to group by Fiscal Year 2010-2011 and Fiscal Year 2011-2012 without creating an extra column.*

The first thing you want to do is define a function that outputs the financial year as a value. You could use the following.
def getFiscalYear(dt):
year = dt.year
if dt.month<4: year -= 1
return year
You say you don't want to use an extra column to group the frame. Typically the groupby method is called by saying something like this df.groupby("colname") however that statement is semantically equivalent to df.groupby(df["colname"] - meaning you can do something like this...
grouped = DT.groupby(DT['Date'].apply(getFiscalYear))
and then apply a method to the groups or whatever you want to do. If you just want these groups separated call grouped.groups

With pandas.DatetimeIndex, that is very simple:
DT.groupby(pd.DatetimeIndex(DT.Date).shift(-3,freq='m').year)
Or if you use Date as an index of DT, it is even simpler:
DT.groupby(DT.index.shift(-3,freq='m').year)
But beware that shift(-3,freq='m') shifts date to ends of months; for example, 8 Apr to 31 Jan and so on. Anyway, it fits your problem well.

I had a similar problem and used the following to offset the business year end to March (month=3) using Grouper and specifying the frequency:
grouped_df = df.groupby([pd.Grouper(key='DateColumn', freq=pd.tseries.offsets.BYearEnd(month=3))])
Pandas Business Year End and
Grouper

The simplest method I've found for this (similar to Alex's answer, but slightly more concise):
df.groupby([pd.Grouper(key='DateColumn', freq="A-MAR")])
If you want year finishing on the last working day you can use freq="BA-MAR"

Similar to this answer, but I would (at this time of this initial post) need to report that the fiscal year is 2023. This is acheived by reversing the inequality and changing the decrement to an increment.
def fiscal_year(dt):
year = dt.year
if dt.month > 4:
year += 1
return year

Related

Pandas DateTimeSlicing for specific months per year

I was reading a lot of stuff about pandas and date time slicing but I haven't found a solution for my problem yet. I hope you could give me some good advices!
I have a data frame with a Datetimeindex and for example a single column with floats. The time series is about 60 years.
For example:
idx = pd.Series(pd.date_range("2016-11-1", freq="M", periods=48))
dft = pd.DataFrame(np.random.randn(48,1),columns=["NW"], index=idx)
enter image description here
I want to aggregate the column "NW" as sum() per month. I have to solve two problems.
The year begins in November and ends in October.
I have two periods per 12 months to analyse:
a) from November to End of April in the following year and
b) from May to End of October in the same year
For example: "2019-11-1":"2020-4-30" and "2020-05-01":"2020-10-31"
I think I could write a function but I wonder if there is an easier way with methods from pandas to solve this problems.
Do you have any tips?
Best regards Tommi.
Here are some additional informations:
The real datas are daily observations. I want to show a scatter plot for a time series with only the sum() for every month from November-April along the timeline (60 years til now). And the same for the values from May to October.
this is my solution so far. Not the shortest way I think, but it works fine.
d_precipitation_winter = {}
#for each year without the current year
for year in dft.index.year.unique()[:-1]:
#Definition start and end date to mark winter months
start_date = date(year,11,1)
end_date = date(year+1,4,30)
dft_WH = dft.loc[start_date:end_date,:].sum()
d_precipitation_winter[year]=dft_WH
df_precipitation_winter = pd.DataFrame(data=d_precipitation_winter)

How to include end date while calculating time difference in years?(Postgresql)

Please share your feedback on this problem. I need to calculate difference in 'years' and store it under a new column 'Age'.
While the formula works fine, it gives me incorrect output when I consider dates starting from 1st Jan of any year
For example: difference in years between 1st Jan 2019 and 31st Dec 2021 is 3 years - this includes end date in calculation. My result shows 2 years.
Here are the 2 date columns from which I am deriving the difference:
However, when I consider dates from 1st Jan - the result shows me one year less:
Here is the code I used to calculate difference:
UPDATE animals
SET age = abs(benchmarkdate :: date - birthdate :: date)/ 365;
Any help would be appreciated. Thank you.
I would use EXTRACT here and then take a difference of only the year components on the two dates:
UPDATE animals
SET age = EXTRACT(year FROM benchmarkdate) - EXTRACT(year FROM birthdate);
Note that you might even want to avoid doing this update, and instead just compute the age when you select. If you foresee the need to frequently do such an update, that would be a good indicator that you probably should change your approach.

Pandas max dayofyear by year

I have a dataframe with a datetimeindex. There are multiple observations on the same day but different times.
I'm familiar with the dayofyear attribute. Is there a way to use this attribute to also determine the max dayofyear by year? The result would be something like:
2015 252
2016 250
2017 251
If I understand your question, you want to look at a list of dates and for each year get the maximum date for that year.
# Sample data
df = pd.DataFrame({'date':pd.DatetimeIndex(start=pd.datetime(2018,12,24),end=pd.datetime(2019,1,2),freq='h')})
df['dayofyear'] = df.date.dt.dayofyear
df['year'] = df.date.dt.year
df.groupby('year').dayofyear.max()
Out:
year
2018 365
2019 2

SSRS count working days only

I need some help in this my case is
1-two parameters date from , date to
2-number of team parameter that manually enter by user for later on use in some calculation
rquirement
count only working days (6days per week ) without Friday based on filtered period (date from and date to)
Code
=(COUNT(IIF(Fields!Job_Status.Value="Closed",1,Nothing))) /
((DateDiff(DateInterval.day,Parameters!DateFrom.Value,Parameters!ToDate.Value
)) * (Parameters!Number_of_teams.Value))
Note
this code is working fine but it calculate all days
thanks in advance
Try this:
=(DATEDIFF(DateInterval.Day, CDATE("2016-02-14"), CDATE("2016-02-17")) + 1)
-(DATEDIFF(DateInterval.WeekOfYear, CDATE("2016-02-14"), CDATE("2016-02-17")) * 2)
-(IIF(WeekdayName(DatePart(DateInterval.Weekday,CDATE("2016-02-14"),FirstDayOfWeek.System))="sunday",1,0)
-(IIF(WeekdayName(DatePart(DateInterval.Weekday,CDATE("2016-02-17"),FirstDayOfWeek.System))="saturday",1,0)
))
It will ruturn count of monday to friday between the given range in the above case it returns 3. For StartDate = 2016-02-14 and EndDate = 2016-02-21 it returns 5.
UPDATE: Expression to exclude friday from the count.
=(DATEDIFF(DateInterval.Day, Parameters!DateFrom.Value, Parameters!ToDate.Value) + 1)
-(DATEDIFF(DateInterval.WeekOfYear, Parameters!DateFrom.Value, Parameters!ToDate.Value) * 1)
-(IIF(WeekdayName(DatePart(DateInterval.Weekday,Parameters!ToDate.Value,FirstDayOfWeek.System))="friday",1,0))
Tested with:
DateFrom ToDate Result
2016-02-12 2016-02-19 6
2016-02-12 2016-02-18 6
2016-02-12 2016-02-15 3
It is very strange to me see a saturday and sunday as working days instead of friday.
Let me know if this helps you.
The most sustainable solution for this kind of question, in the long term, is to create a "date dimension" aka "calendar table". That way any quirks in the classification of dates that don't conform to some neat mathematical pattern can be accommodated. If your government decides to declare date X a public holiday starting from next year, just add it to your public holidays column (attribute). If you want to group by say "work days, weekends, and public holidays" no need to reinvent the wheel, just add that classification to the calendar table and everyone has the benefit of it and you don't need to worry about inconsistency in calculation/classification. You might want the first or last working day of the month. Easy, filter by that column in the calendar table.

Sort by month value in ssrs

I have a pretty awkward setup in SSRS with custom grouping options (i.e. by office, seller, buyer etc.). In example, user can group items by year and then by month. The outcome of the report is then:
Gr1. | Gr2.
2015 | April
2015 | August
2015 | February
2015 | January
and so on...
So both columns are alphabetically ordered, which works excellent for all custom grouping options but months - which should have their own logic for sorting. How could I implement that?
You should try to return all information form the database in its native type. Instead of returning the month names as ‘January’, ‘February’ etc, it would be better to return 20150101, 20150201 for example (or whatever the default date format is for your environment)
You can then alter the format of the type in the report. For example, set up the cell to return
=MonthName(Month(Fields!myDate.Value),False)
To return the name of the Month for any date. SSRS will then know when ordering how to put the dates in the correct order.
ALTERNATIVE
Assuming you do not want to edit the way the data is returned, you can use a Select statement in the code behind your report to manually provide an order for the months.
Supposing the Dataset
Month Val
---------- ---
January 1
February 2
March 3
When sorted by Month returns the following
Instead, insert the following code to your report (right click the report area, select Report Properties, then Code)
public function MonthNumber(MonthName AS String) AS Integer
Dim MonthNum AS Integer = 0
Select Case MonthName
Case "January"
MonthNum = 1
Case "February"
MonthNum = 2
Case "March"
MonthNum = 3
End Select
return MonthNum
end function
Then set the Group Properties to be Sorted on
=Code.MonthNumber(Fields!Month.Value)
Gives the following result
This is because when ordering the set, instead of just looking at the name of the report, it passes the name through the Code, which is effectively telling the report which number to assign to each month. It is then ordered on this value instead of the month's name.
Hopefully this is useful to you. If you have any further question please let me know.