Averaging the data across two calendar years and defining the beginning month - pandas

I have a data for a period from December 2013 to November 2018. I converted it into a data frame as shown here.
Date 0.1 0.2 0.3 0.4 0.5 0.6
2013-12-01 301.04 297.4 296.63 295.76 295.25 295.25
2013-12-04 297.96 297.15 296.25 295.25 294.43 293.45
2013-12-05 298.4 297.61 296.65 295.81 294.75 293.89
2013-12-08 298.82 297.95 297.15 296.25 295.45 294.41
2013-12-09 298.65 297.65 296.95 296.02 295.13 294.05
2013-12-12 299.05 297.33 296.65 295.81 294.85 293.85
2013-12-16 301.05 300.28 299.38 298.45 297.65 296.51
....
2014-01-10 301.65 297.45 296.46 295.52 294.65 293.56
2014-01-11 301.99 298.95 298.39 297.15 296.05 295.11
2014-01-12 299.86 298.65 297.73 296.82 296.35 295.37
2014-01-13 299.25 298.15 297.3 296.43 295.26 294.31
I want to take monthly mean and seasonal mean of this data.
For monthly mean I have tried
df.resample('M').mean()
And it worked well.
For seasons, I would like decompose this data into 4 seasons (December-Feb; Mar-May; June-Aug; and Sep-Nov) of three months interval. While I tried the resample with 3 months interval. i.e.
df.resample('3M').mean()
However this is not worked well as it giving the average for the starting December month separately and then considering the above said interval for a calendar year (ie. from January to March and so on).
I would like to know if there are any possible ways to avoid this by specifying which month is our period of consideration begins.
Moreover, I would also like to know whether we can define these seasons beforehand and group the data accordingly to get averages with more ease.

You can define the origin in resample:
df.resample('M', origin=pd.Timestamp('2013-12-01')).mean()

Related

Finding Week-on-Week, Month-on-Month and Year-on-Year returns

I have a price series in a dataframe, with a datetime index. Just appending the top 5 rows of the price series, but I basically have data all the way from 2020-04-01.
Date
CTH3 Comdty
2022-11-28
78.95
2022-11-25
80.18
2022-11-23
82.90
2022-11-22
82.42
2022-11-21
79.78
So for example, the weekly return for 2002-11-28 should be based on the price from 5 business days ago, i.e. 2022-11-21, and so (78.95 - 79.78)/79.78 = -1.04%
I would like to calculate week-on-week (WoW) return, month-on-month (MoM) return, year-on-year (YoY) return for each day. For MoM and YoY, it should be based on the price from exactly 1 month or 1 year ago respectively, but if that day is not a business day and there is no price, then to take the price from day before and so on. For this I know I can use .ffill or .bfill in some way.
My current thinking is to use .loc and use a for loop to input the 1 week ago, 1 month ago, and 1 year ago prices as 3 different columns and then proceed to do the % calculation. But this seems a tad bit tedious. How would I go about doing this in a more efficient way?
Instead of .iloc, you can also try .iat.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iat.html

Date_diff with specific condition time start and time end

is it possible to have date_diff with specific start and end time?
let say my store are open from 8AM - 10PM, which is 14 Hours.
and I have a lot of stuff to sell during that time. One of the SKU is out of stock from 2022-11-01 06.00 PM until tomorrow 2022-11-02 11.00 AM.
Instead of calculate 24 hours, I just want to calculate only from opening store until it closed or until its restock. Meaning from 6PM to 11AM is 8 Hours
my query
select date_diff('2022-11-02 11.00 AM', '2022-11-02 06.00 PM', hour) from table
with the result 17 hours instead of 8 hours
There isn't a way to configure DATE_DIFF to do this for you, but it's possible to do what you want, with some effort.
You should convert your dates to timestamps (TIMESTAMP(yourdate) or CAST(yourdate AS TIMESTAMP)) and use TIMESTAMP_DIFF instead.
This will allow you to work with smaller intervals than days.
For your calculation, you ultimately need to find the total time difference between the two timestamps and then subtract the out-of-hours timeframe.
However, calculating the latter is not as simple as taking the difference in days and multiplying by 8 hours (10pm-6am), because your out-of-hours calculation has to account for weekends and possibly holidays etc. Hence it can get quite complex, which is where the solution in my first link might come in.

Why is Redshift datediff giving different weeks when it is the same number of day difference?

I'm trying to find the number of weeks between two days. When the difference is 8 days, I should be getting 1 or 2 weeks, depending on how the function works in Redshift (rounds up or down). However, it should be consistent whichever way it chooses.
I realize that I could simply take the number of days and then divide by 7 and do either a ROUND or a CEIL, but I am simply trying to understand why DATEDIFF(weeks, date1, date2) provides either 1 or 2 when I have the two dates different by 8 days.
SELECT
DATEDIFF(weeks, '2019-03-17', '2019-03-25') AS week_difference1,
DATEDIFF(days, '2019-03-17', '2019-03-25') AS day_difference1,
DATEDIFF(weeks, '2019-03-16', '2019-03-24') AS week_difference2,
DATEDIFF(days, '2019-03-16', '2019-03-24') AS day_difference2
Result:
week_difference1 = 1
day_difference1 = 8
week_difference2 = 2
day_difference2 = 8
As with many software products from the US, the first day of the week in Redshift (at least far as DATEDIFF is concerned) is Sunday, and not the ISO standard of Monday. Therefore when calculating the number of weeks between two dates the boundary is Saturday/Sunday.
In your example, the eight days between the 16th March 2019 and 24th March 2019 crosses two week boundaries (one on 16/17 March and one on 23/24 March), so the resulting DATEDIFF value is 2 (two week boundaries crossed).
However, the eight days between the 17th March and 25th March only crosses one week boundary (23/24 March) so the resulting DATEDIFF value is 1.

Dynamic Work shifts in Tabular model

I have a static calendar hierarchy in SSAS Tabular model with minutes, hours, days. This way I can do cool metric grouping by dates like:
2017-01-01 45
2017-01-02 3
2017-01-03 17
I want to be able to do groupings by work shifts which are no more then 24 hours long. I could extend my calendar hierarchy with static shifts. But I need work shifts to be user definable on the fly.
For example user can set 22-hour shift from 03:00 to 01:00.
What approach should I take to include dynamic shifts into calendar hierarchy?

Days Unavailable

I need a simple SQL to accomplish the below:
Problem:
When a petrol bunk runs out of fuel, the admin makes note of the DateTime (RunOutDate) when it ran out of fuel and notes also the DateTime (ResupplyDate) when the fuel supply was back on.
I need to create a report on how many days the bunk ran out of fuel.
eg.
1/1/1 10:10 to 1/1/1 10:50 should be counted as 1
1/1/1 10:10 to 2/1/1 07:20 should be counted as 2
1/1/1 23:55 to 2/1/1 00:10 should be counted as 2
I can not bank using hours using DateDiff as 24 hours could have spanned across 2 days.
TIA
DATEDIFF(d, RunOutDate, ResupplyDate) + 1
Remember that DATEDIFF always counts the number of BOUNDARIES that you cross. For days (first argument d), it counts the number of times the clock passed midnight. So to count the number of days covered you just add 1.
DATEDIFF using day, then add 1.
DATEDIFF uses the midnight to count days so you'll get 0, 1, 1 for each example above. Then add 1.
DATEDIFF(day, '16 Dec 2008 10:10', '16 Dec 2008 10:50') + 1