I have a dataframe with a datetimeindex. There are multiple observations on the same day but different times.
I'm familiar with the dayofyear attribute. Is there a way to use this attribute to also determine the max dayofyear by year? The result would be something like:
2015 252
2016 250
2017 251
If I understand your question, you want to look at a list of dates and for each year get the maximum date for that year.
# Sample data
df = pd.DataFrame({'date':pd.DatetimeIndex(start=pd.datetime(2018,12,24),end=pd.datetime(2019,1,2),freq='h')})
df['dayofyear'] = df.date.dt.dayofyear
df['year'] = df.date.dt.year
df.groupby('year').dayofyear.max()
Out:
year
2018 365
2019 2
Related
I try to keep only month and year in this df. I tried several solutions but it is not working. Can you help me ?
YOu need to do this (as you post no data, you'll need to adapt this to your case):
from datetime import datetime
datetime_object = datetime.now()
print(datetime_object)
2021-11-30 15:57:20.812209
And to get the year and month do this:
new_date_month= datetime_object.month
print(new_date_month)
new_date_year = datetime_object.year
print(new_date_year)
11
2021
If you need them as new columns in you df:
df['year']=datetime_object.year
df['Month']=datetime_object.month
Note that if your column is not a datetime, this will not work. Given to format of date you hve you will need to do this first:
st = '2021-11-30 15:57:20.812209'
datetime.strptime(st, '%Y-%m-%d %H:%M:%S.%f')
I am familiar with dayofyear. However, this time I have dates that span 2 years (2018, 2019). I'd like to get the day of year that would go from 1 to 730 (365+365). For example, Jan 3rd, 2019 should be 368 and Jan 3rd, 2018 should be 3. Is there a built-in way to do this? or do I need to write some function manually?
Thanks
Use year as well being even or odd and you then have 1 to 730
import datetime as dt
df = pd.DataFrame({"Date":pd.date_range(dt.datetime(2018,1,1), dt.datetime(2019,12,31))})
df["Date"].dt.dayofyear * ((df["Date"].dt.year % 2) + 1)
I've seen extractions of date, month and year from data format: "DD-MM-YYYY" and the like. (Where the month is numbered rather than named)
However, I have a dataset which has date values in the format: "Month_name date, year".
Eg. "August 30, 2019".
Assume that your DataFrame contains TxtDate column, with
date strings:
TxtDate
0 August 30, 2019
1 May 12, 2020
2 February 16, 2020
The first step is to convert the source column to datetime type and save it
in a new column:
df['Date'] = pd.to_datetime(df.TxtDate)
This function is so "clever" that you can do even without explicit
format specification.
Then extract partilular date components (and save them in respective
columns):
df['Year'] = df.Date.dt.year
df['Month'] = df.Date.dt.month
df['Day'] = df.Date.dt.day
And the last step is to drop Date column (you didn't write
that you need the whole date):
df.drop(columns='Date', inplace=True)
The result is:
TxtDate Year Month Day
0 August 30, 2019 2019 8 30
1 May 12, 2020 2020 5 12
2 February 16, 2020 2020 2 16
Maybe you should also drop TxtDate column (your choice).
I can calculate the yearly averages, max and min values based on taking the first day of a year as January First like:
yearly_avg=df2.groupby(years).mean()
yearly_sum=df2.groupby(years).sum()
yearly_MAX=df2.groupby(years).max()
yearly_MIN=df2.groupby(years).min()
I need to calculate averages, max and min numbers based on the water year where October 1st is the first day of a year. As an explanation of "water year": https://en.wikipedia.org/wiki/Water_year
Here is my sample file stored here:
https://drive.google.com/file/d/1AYi9vp3_DPXHoCPB_YkMQp68FvC_INrV/view?usp=sharing
How can I do that?
Thanks.
Just set the year to start on October instead of January. I am just happy your columns were already datetime types, this made it easier!
df = pd.read_excel('sample_water_year.xlsx')
df['# YEAR'] = df.Dates.dt.to_period('A-Sep') #year ends on sep
Note that you will have to subtract 1 as it does not change the initial year value set.
df['# YEAR'] = df['# YEAR'] - 1
Then simply find the summary statistics:
yearly_avg = df.groupby('# YEAR').mean()
yearly_sum = df.groupby('# YEAR').sum()
yearly_MAX = df.groupby('# YEAR').max()
yearly_MIN = df.groupby('# YEAR').min()
Hopefully this helps!
I have a dataframe with a datetime64 column called DT. Is it possible to use groupby to group by financial year from April 1 to March 31?
For example,
Date | PE_LOW
2010-04-01 | 15.44
...
2011-03-31 | 16.8
2011-04-02 | 17.
...
2012-03-31 | 17.4
For the above data, I want to group by Fiscal Year 2010-2011 and Fiscal Year 2011-2012 without creating an extra column.*
The first thing you want to do is define a function that outputs the financial year as a value. You could use the following.
def getFiscalYear(dt):
year = dt.year
if dt.month<4: year -= 1
return year
You say you don't want to use an extra column to group the frame. Typically the groupby method is called by saying something like this df.groupby("colname") however that statement is semantically equivalent to df.groupby(df["colname"] - meaning you can do something like this...
grouped = DT.groupby(DT['Date'].apply(getFiscalYear))
and then apply a method to the groups or whatever you want to do. If you just want these groups separated call grouped.groups
With pandas.DatetimeIndex, that is very simple:
DT.groupby(pd.DatetimeIndex(DT.Date).shift(-3,freq='m').year)
Or if you use Date as an index of DT, it is even simpler:
DT.groupby(DT.index.shift(-3,freq='m').year)
But beware that shift(-3,freq='m') shifts date to ends of months; for example, 8 Apr to 31 Jan and so on. Anyway, it fits your problem well.
I had a similar problem and used the following to offset the business year end to March (month=3) using Grouper and specifying the frequency:
grouped_df = df.groupby([pd.Grouper(key='DateColumn', freq=pd.tseries.offsets.BYearEnd(month=3))])
Pandas Business Year End and
Grouper
The simplest method I've found for this (similar to Alex's answer, but slightly more concise):
df.groupby([pd.Grouper(key='DateColumn', freq="A-MAR")])
If you want year finishing on the last working day you can use freq="BA-MAR"
Similar to this answer, but I would (at this time of this initial post) need to report that the fiscal year is 2023. This is acheived by reversing the inequality and changing the decrement to an increment.
def fiscal_year(dt):
year = dt.year
if dt.month > 4:
year += 1
return year