I can calculate the yearly averages, max and min values based on taking the first day of a year as January First like:
yearly_avg=df2.groupby(years).mean()
yearly_sum=df2.groupby(years).sum()
yearly_MAX=df2.groupby(years).max()
yearly_MIN=df2.groupby(years).min()
I need to calculate averages, max and min numbers based on the water year where October 1st is the first day of a year. As an explanation of "water year": https://en.wikipedia.org/wiki/Water_year
Here is my sample file stored here:
https://drive.google.com/file/d/1AYi9vp3_DPXHoCPB_YkMQp68FvC_INrV/view?usp=sharing
How can I do that?
Thanks.
Just set the year to start on October instead of January. I am just happy your columns were already datetime types, this made it easier!
df = pd.read_excel('sample_water_year.xlsx')
df['# YEAR'] = df.Dates.dt.to_period('A-Sep') #year ends on sep
Note that you will have to subtract 1 as it does not change the initial year value set.
df['# YEAR'] = df['# YEAR'] - 1
Then simply find the summary statistics:
yearly_avg = df.groupby('# YEAR').mean()
yearly_sum = df.groupby('# YEAR').sum()
yearly_MAX = df.groupby('# YEAR').max()
yearly_MIN = df.groupby('# YEAR').min()
Hopefully this helps!
Related
I was reading a lot of stuff about pandas and date time slicing but I haven't found a solution for my problem yet. I hope you could give me some good advices!
I have a data frame with a Datetimeindex and for example a single column with floats. The time series is about 60 years.
For example:
idx = pd.Series(pd.date_range("2016-11-1", freq="M", periods=48))
dft = pd.DataFrame(np.random.randn(48,1),columns=["NW"], index=idx)
enter image description here
I want to aggregate the column "NW" as sum() per month. I have to solve two problems.
The year begins in November and ends in October.
I have two periods per 12 months to analyse:
a) from November to End of April in the following year and
b) from May to End of October in the same year
For example: "2019-11-1":"2020-4-30" and "2020-05-01":"2020-10-31"
I think I could write a function but I wonder if there is an easier way with methods from pandas to solve this problems.
Do you have any tips?
Best regards Tommi.
Here are some additional informations:
The real datas are daily observations. I want to show a scatter plot for a time series with only the sum() for every month from November-April along the timeline (60 years til now). And the same for the values from May to October.
this is my solution so far. Not the shortest way I think, but it works fine.
d_precipitation_winter = {}
#for each year without the current year
for year in dft.index.year.unique()[:-1]:
#Definition start and end date to mark winter months
start_date = date(year,11,1)
end_date = date(year+1,4,30)
dft_WH = dft.loc[start_date:end_date,:].sum()
d_precipitation_winter[year]=dft_WH
df_precipitation_winter = pd.DataFrame(data=d_precipitation_winter)
I have a table with the sales from last 2 years, and I want to compare the sales from this year with the same natural day last year. For example, Sunday 1st of April 2018 will be compared with Sunday 2nd April 2017.
In order to do that I have created the measure
sales_last_year = CALCULATE(Sales[Revenue]); SAMEPERIODLASTYEAR(DATEADD('Calendar'[Date];+1;DAY)))
And I have created another measure where I have the value from the same day last year:
Prueba_sales_last_year = CALCULATE(Sales[Revenue]); SAMEPERIODLASTYEAR('Calendar'[Date]))
The result is the following:
Sales last year
As you can see the sales per day shows 5.316€ and 3.546€, which is correct, but the total is 111.796 €, which is not correct. However, the measure with the formula without the natural day the sum of the two rows is correct. How could I solve this?
Thank you very much in advance
I just changed the order to calculate the date and it was solved.
sales_last_year = CALCULATE(Sales[Revenue]);DATEADD( SAMEPERIODLASTYEAR('Calendar'[Date]);+1;DAY))
I have a DimDate table that has a Billable Day Portion field that can be between 0 and 1. For each day that's in the current Bonus Period I want to multiple that Day Portion by 10, and then return the total sum.
To find out what Bonus Period we're in, I return ContinuousBonusPeriod where the date equals today:
Current Continuous Bonus Period:= CALCULATE(MAX(DimDate[ContinuousBonusPeriod]), FILTER(DimDate, DimDate[DateKey] = TODAY()))
I can see in the measure display this is correctly coming back as Bonus Period 1. However, when I then use ContinuousBonusPeriod in the measure to determine the number of days in the current period, it only returns 10, 1 day multiplied by the static 10.
Billable Hours This Period:= CALCULATE(SUMX(DimDate, DimDate[Billable Day Portion] * 10), FILTER(DimDate, DimDate[ContinuousBonusPeriod] = [Current Continuous Bonus Period]))
It appears to be only counting today's DimDate record instead of all the records whereContinuousBonusPeriod = 'Bonus Period 1' as I'd expect.
I needed to make sure no existing filter was applied to the DimDate table when calculating the Current Continuous Bonus Period:
Current Continuous Bonus Period:= CALCULATE(MAX(DimDate[ContinuousBonusPeriod]), FILTER(ALL(DimDate), DimDate[DateKey] = TODAY()))
(Notice the ALL() statement)
I have a table in a data model that has forecast figures for the next 3 months. What I want to do is to show what the forecast number for the current month to date is.
When I use the DATESMTD function like this:
=CALCULATE(SUM(InternetSales_USD[SalesAmount_USD]),DATESMTD(DateTime[DateKey]))
I get the last month of my data summarised as a total. I assume that is because the DATESMTD function takes the last date in the column and that is 3 months away.
How do I make sure I get this current month MTD total rather then the end of the calendar? The formula should be clever enough to realise I am in May and want the May MTD not the August MTD.
Any ideas?
The way to do this is to do this:
Forecast_Transaction_MTD:=CALCULATE(sum('ATO Online'[2017 Transaction Forecast]), DATESINPERIOD('ATO Online'[Current Year],TODAY(),-day(TODAY()),day))
the last -day(TODAY()) gets the day number for the current day and subtract it from today's date. So, today is the 25 May. the -day(TODAY())),day)) extracts the day (25) and subtracts it from the current date to get me to the 1 May.
The rest of the formula just adds the total for the dates.
I have a scenario.
In the sample table below, I need to show the sales by year…
And for each year, I need to show the last yr and last 2nd year sales for that year.
For example in 2014,
Current Year = 2014 Sales
Last Year = 2013 Sales
Current Year = 2013 Sales
Last Year = 2012 Sales
|----------2013------------|---------2014-------------|
| Last Year | Current Year | Last Year | Current Year |
Ive tried but when i nest them under a year dimension.. the calculations are not working.. is there a way around this, to come up with this kind of report format? our user is very particular in having such format..
many thanks for the help.
I'd simply hardcode all rows, and skip the year dimension:
Current Year
Sum({< Date = {">=$(=YearStart(min(Date),0"}*{">=$(=Addyears(max(Date),0)"} >} SalesAmount)
Last Year:
Sum({< Date = {">=$(=YearStart(min(Date),-1"}*{">=$(=Addyears(max(Date),-1)"} >} SalesAmount)
-2 Year:
Sum({< Date = {">=$(=YearStart(min(Date),-2"}*{">=$(=Addyears(max(Date),-2)"} >} SalesAmount)
I think this could be achieved using a pivot table. Here's an example.
You can solve this problem at the script side while loading data. So that you can compare year to date data with previous year with until corresponding month.
Transaction_Table:
LOAD date,productID,amount
FROM data.qvd;
concatenate
Load AddYears(date,1) as date,productID,amount_1
from data.qvd where date<=AddYears($(=max(date)),-1);
Data_Table:
load distinct
date,
month(date) as Month,
year(date) as Year
resident Transaction_Table;
There will be two coloumns "amount" is current date's data and "amount_1" is previous year's same day data.
Create pivot chart put year to top and product to left and create two expressions. One for calculation of amount_1: previous term and one for amount: current term
You can name expressions:
previous year label: =year-1
current year label: =year