cust_id start end subs_price_p_month
1 2019-01-01 2019-12-10 50.00
1 2020-02-03 2020-08-05 39.99
2 2019-12-11 2020-11-08 29.99
I would like to "unstack" the table above, so that each row contains the subs price for 1 month:
cust_id month subs_price_p_month
1 2019-01-01 50.00
1 2019-02-01 50.00
1 2019-03-01 50.00
....
1 2019-12-01 50.00
1 2020-02-01 39.99
1 2020-03-01 39.99
1 2020-04-01 39.99
....
1 2020-08-01 39.99
2 2019-12-01 29.99
2 2020-01-01 29.99
2 2020-02-01 29.99
...
2 2020-11-01 29.99
Text explanation:
Customer ID 1 has 2 subscriptions with different prices. The first one starts from 1 January 2020 until December 2020, second one from 3 February 2021 to 5 August 2020.
Customer ID 2 has only 1 subscription, from December 2019 to November 2020.
I want each row equals 1 customer ID, 1 month for easier data manipulation.
generate_series() generates the sequence of dates that you need. However, it is tricky to get the date arithmetic just right for your results.
You seem to want:
select t.cust_id, yyyymm, t.subs_price_p_month
from t cross join lateral
generate_series( date_trunc('month', startd),
date_trunc('month', endd),
interval '1 month'
) gs(yyyymm);
However, if there are multiple rows within the same month, you would get duplicates. This question does not clarify what to do in that case. If you need to handle that case, I would suggest asking a new question.
Here is a db<>fiddle.
Use generate_series with an interval of 1 month and the range from start and end, e.g.:
SELECT
cust_id,
generate_series(start,end,interval '1 month'),
subs_price_p_month
FROM t;
cust_id | generate_series | subs_price_p_month
---------+------------------------+--------------------
1 | 2019-01-01 00:00:00+01 | 50.00
1 | 2019-02-01 00:00:00+01 | 50.00
1 | 2019-03-01 00:00:00+01 | 50.00
1 | 2019-04-01 00:00:00+02 | 50.00
1 | 2019-05-01 00:00:00+02 | 50.00
1 | 2019-06-01 00:00:00+02 | 50.00
1 | 2019-07-01 00:00:00+02 | 50.00
...
Perhaps even formatting the dates to 'Month YYYY' would better display your result set:
SELECT
cust_id,
to_char(generate_series(start_,end_,interval '1 month'),'Month YYYY')
subs_price_p_month
FROM t;
cust_id | to_char | subs_price_p_month
---------+----------------+--------------------
1 | January 2019 | 50.00
1 | February 2019 | 50.00
1 | March 2019 | 50.00
1 | April 2019 | 50.00
1 | May 2019 | 50.00
1 | June 2019 | 50.00
1 | July 2019 | 50.00
...
Demo: db<>fiddle
Related
I'd like to analyze some daily data by hydrologic year: From 1 September to 31 August. I've created a synthetic data set with:
import pandas as pd
t = pd.date_range(start='2015-01-01', freq='D', end='2021-09-03')
df = pd.DataFrame(index = t)
df['hydro_year'] = df.index.year
df['hydro_year'].loc[df.index.month >= 9] += 1
df['id'] = df['hydro_year'] - df.index.year[0]
df['count'] = 1
Note that in reality I do not have a hydro_year column so I do not use groupby. I would expect the following to resample by hydrologic year:
print(df['2015-09-01':].resample('12M').agg({'hydro_year':'mean','id':'mean','count':'sum'}))
But the output does not align:
| | hydro_year | id | count |
|---------------------+------------+---------+-------|
| 2015-09-30 00:00:00 | 2016 | 1 | 30 |
| 2016-09-30 00:00:00 | 2016.08 | 1.08197 | 366 |
| 2017-09-30 00:00:00 | 2017.08 | 2.08219 | 365 |
| 2018-09-30 00:00:00 | 2018.08 | 3.08219 | 365 |
| 2019-09-30 00:00:00 | 2019.08 | 4.08219 | 365 |
| 2020-09-30 00:00:00 | 2020.08 | 5.08197 | 366 |
| 2021-09-30 00:00:00 | 2021.01 | 6.00888 | 338 |
However, if I start a day earlier, then things do align, except the first day is 'early' and dangling alone...
| | hydro_year | id | count |
|---------------------+------------+----+-------|
| 2015-08-31 00:00:00 | 2015 | 0 | 1 |
| 2016-08-31 00:00:00 | 2016 | 1 | 366 |
| 2017-08-31 00:00:00 | 2017 | 2 | 365 |
| 2018-08-31 00:00:00 | 2018 | 3 | 365 |
| 2019-08-31 00:00:00 | 2019 | 4 | 365 |
| 2020-08-31 00:00:00 | 2020 | 5 | 366 |
| 2021-08-31 00:00:00 | 2021 | 6 | 365 |
| 2022-08-31 00:00:00 | 2022 | 7 | 3 |
IIUC, you can use 12MS (Start) instead of 12M:
>>> df['2015-09-01':].resample('12MS') \
.agg({'hydro_year':'mean','id':'mean','count':'sum'})
hydro_year id count
2015-09-01 2016.0 1.0 366
2016-09-01 2017.0 2.0 365
2017-09-01 2018.0 3.0 365
2018-09-01 2019.0 4.0 365
2019-09-01 2020.0 5.0 366
2020-09-01 2021.0 6.0 365
2021-09-01 2022.0 7.0 3
We can try with Anchored Offsets annually starting with SEP:
resampled_df = df['2015-09-01':].resample('AS-SEP').agg({
'hydro_year': 'mean', 'id': 'mean', 'count': 'sum'
})
hydro_year id count
2015-09-01 2016.0 1.0 366
2016-09-01 2017.0 2.0 365
2017-09-01 2018.0 3.0 365
2018-09-01 2019.0 4.0 365
2019-09-01 2020.0 5.0 366
2020-09-01 2021.0 6.0 365
2021-09-01 2022.0 7.0 3
Date from Date to
2018-12-11 2019-01-08
2019-01-08 2019-02-09
2019-02-10 2019-03-14
2019-03-17 2019-04-11
2019-04-15 2019-05-16
2019-05-16 2019-06-13
output will be like this
Date from Date to Days
2018-12-11 2019-01-08 0
2019-01-08 2019-02-09 1
2019-02-10 2019-03-14 3
2019-03-17 2019-04-11 4
2019-04-15 2019-05-16 0
2019-05-16 2019-06-13 -
To return the difference between two date values in days you could use the DATEDIFF() Function, something like:
SELECT DATEDIFF(DAY, DayFrom, DayTo) AS 'DaysBetween'
FROM DateTable
You want lead() and a date diff function:
select
date_from,
date_to,
datediff(day, date_to, lead(date_from) over(order by date_from)) days
from mytable
datediff() is a SQLServer function. There are equivalents in other RDBMS.
Side note: I would recommend againts using a string value (-) for records that do not have a next record, since other values are numeric (the datatypes in a column must be consistant). null is good enough for this (which the above query will produce).
Demo on DB Fiddle:
date_from | date_to | days
:------------------ | :------------------ | ---:
11/12/2018 00:00:00 | 08/01/2019 00:00:00 | 0
08/01/2019 00:00:00 | 09/02/2019 00:00:00 | 1
10/02/2019 00:00:00 | 14/03/2019 00:00:00 | 3
17/03/2019 00:00:00 | 11/04/2019 00:00:00 | 4
15/04/2019 00:00:00 | 16/05/2019 00:00:00 | 0
16/05/2019 00:00:00 | 13/06/2019 00:00:00 | null
I am trying to calculate what date will be after 2 or more working hours from now even if I'll start calculating on weekend or after workhours it should be like:
working hours are from 8am to 4pm
I start calculating at Friday at 3pm so if I'll start calculating result should be Monday 9am
if(#data_przyj>#WorkStart AND DATEPART(DATEADD(MINUTE,#ileNaZapytanie,#data_przyj)<#WorkFinish)
BEGIN
while (DATEPART(dw, #CurrentDate)!=1 AND DATEPART(dw, #CurrentDate)!=7))
BEGIN
SET #CurrentDate = DATEADD(day, 1, #CurrentDate)
SET #czyBylPrzeskok =1
END
if (#czyBylPrzeskok =1)
BEGIN
SET #LastDay = #CurrentDate
SET #LastDay = DATEADD(MINUTE, datediff(MINUTE,DATEADD(dd, 0, DATEDIFF(MINUTE, 0, #data_przyj)),#WorkStart), #LastDay)
SET #LastDay = DATEADD(HOUR, datediff(MINUTE,DATEADD(dd, 0, DATEDIFF(HOUR, 0, #data_przyj)),#WorkStart), #LastDay)
END
ELSE
BEGIN
SET #LastDay = DATEADD(MINUTE,#ileNaZapytanie,#data_przyj)
END
SET #IsCalculated = 1
END
else if(#data_przyj>#WorkStart AND DATEADD(MINUTE,#ileNaZapytanie,#data_przyj)>#WorkFinish)
BEGIN
SET #LastDay =DateADD(DD,3,GETDATE());
SET #IsCalculated = 1
END
else if(#data_przyj<#WorkStart )
BEGIN
SET #LastDay =GETDATE();
SET #IsCalculated = 1
END
END
EDIT:
for example working hours:8:00 - 16:00 i have Date '2019-09-06 15:00' so after adding 2 working hours should be '2019-09-09 09:00', for date '2019-09-06 13:00' should be '2019-09-06 15:00' etc
The following solution uses a calendar table with working hours, then use a rolling sum to accumulate each day's business hours and find which day you need to end with.
Using a calendar table will give you the flexibility of having different business time periods and very easily adding or removing holidays.
Setup (calendar table):
IF OBJECT_ID('tempdb..#WorkingCalendar') IS NOT NULL
DROP TABLE #WorkingCalendar
CREATE TABLE #WorkingCalendar (
Date DATE PRIMARY KEY,
IsWorkingDay BIT,
WorkingStartTime DATETIME,
WorkingEndTime DATETIME)
SET DATEFIRST 1 -- 1: Monday, 7: Sunday
DECLARE #StartDate DATE = '2019-01-01'
DECLARE #EndDate DATE = '2030-01-01'
;WITH RecursiveDates AS
(
SELECT
GeneratedDate = #StartDate
UNION ALL
SELECT
GeneratedDate = DATEADD(DAY, 1, R.GeneratedDate)
FROM
RecursiveDates AS R
WHERE
R.GeneratedDate < #EndDate
)
INSERT INTO #WorkingCalendar (
Date,
IsWorkingDay,
WorkingStartTime,
WorkingEndTime)
SELECT
Date = R.GeneratedDate,
IsWorkingDay = CASE
WHEN DATEPART(WEEKDAY, R.GeneratedDate) BETWEEN 1 AND 5 THEN 1 -- From Monday to Friday
ELSE 0 END,
WorkingStartTime = CASE
WHEN DATEPART(WEEKDAY, R.GeneratedDate) BETWEEN 1 AND 5
THEN CONVERT(DATETIME, R.GeneratedDate) + CONVERT(DATETIME, '08:00:00') END,
WorkingEndTime = CASE
WHEN DATEPART(WEEKDAY, R.GeneratedDate) BETWEEN 1 AND 5
THEN CONVERT(DATETIME, R.GeneratedDate) + CONVERT(DATETIME, '16:00:00') END
FROM
RecursiveDates AS R
OPTION
(MAXRECURSION 0)
Generates a table like the following:
+------------+--------------+-------------------------+-------------------------+
| Date | IsWorkingDay | WorkingStartTime | WorkingEndTime |
+------------+--------------+-------------------------+-------------------------+
| 2019-01-01 | 1 | 2019-01-01 08:00:00.000 | 2019-01-01 16:00:00.000 |
| 2019-01-02 | 1 | 2019-01-02 08:00:00.000 | 2019-01-02 16:00:00.000 |
| 2019-01-03 | 1 | 2019-01-03 08:00:00.000 | 2019-01-03 16:00:00.000 |
| 2019-01-04 | 1 | 2019-01-04 08:00:00.000 | 2019-01-04 16:00:00.000 |
| 2019-01-05 | 0 | NULL | NULL |
| 2019-01-06 | 0 | NULL | NULL |
| 2019-01-07 | 1 | 2019-01-07 08:00:00.000 | 2019-01-07 16:00:00.000 |
| 2019-01-08 | 1 | 2019-01-08 08:00:00.000 | 2019-01-08 16:00:00.000 |
| 2019-01-09 | 1 | 2019-01-09 08:00:00.000 | 2019-01-09 16:00:00.000 |
| 2019-01-10 | 1 | 2019-01-10 08:00:00.000 | 2019-01-10 16:00:00.000 |
| 2019-01-11 | 1 | 2019-01-11 08:00:00.000 | 2019-01-11 16:00:00.000 |
| 2019-01-12 | 0 | NULL | NULL |
| 2019-01-13 | 0 | NULL | NULL |
| 2019-01-14 | 1 | 2019-01-14 08:00:00.000 | 2019-01-14 16:00:00.000 |
| 2019-01-15 | 1 | 2019-01-15 08:00:00.000 | 2019-01-15 16:00:00.000 |
| 2019-01-16 | 1 | 2019-01-16 08:00:00.000 | 2019-01-16 16:00:00.000 |
| 2019-01-17 | 1 | 2019-01-17 08:00:00.000 | 2019-01-17 16:00:00.000 |
+------------+--------------+-------------------------+-------------------------+
Proposed Solution:
DECLARE #v_BusinessHoursToAdd INT = 2
DECLARE #v_CurrentDateTimeHour DATETIME = '2019-09-06 15:00'
;WITH CalendarFromNow AS
(
SELECT
T.Date,
WorkingStartTime = CASE
WHEN #v_CurrentDateTimeHour BETWEEN T.WorkingStartTime AND T.WorkingEndTime THEN #v_CurrentDateTimeHour
ELSE T.WorkingStartTime END,
WorkingEndTime = T.WorkingEndTime
FROM
#WorkingCalendar AS T
WHERE
T.Date >= CONVERT(DATE, #v_CurrentDateTimeHour) AND
T.IsWorkingDay = 1
),
RollingBusinessSum AS
(
SELECT
C.Date,
C.WorkingStartTime,
C.WorkingEndTime,
AmountBusinessHours = DATEDIFF(HOUR, C.WorkingStartTime, C.WorkingEndTime),
RollingBusinessHoursSum = SUM(DATEDIFF(HOUR, C.WorkingStartTime, C.WorkingEndTime)) OVER (ORDER BY C.Date),
PendingHours = #v_BusinessHoursToAdd - SUM(DATEDIFF(HOUR, C.WorkingStartTime, C.WorkingEndTime)) OVER (ORDER BY C.Date)
FROM
CalendarFromNow AS C
)
SELECT TOP 1
EndingHour = DATEADD(
HOUR,
R.PendingHours,
R.WorkingEndTime)
FROM
RollingBusinessSum AS R
WHERE
R.PendingHours < 0
ORDER BY
R.Date
Explanation:
The first CTE CalendarFromNow is simply filtering the calendar dates from the current hour's date and reducing the starting working datetime to the current hour, since this is gonna be the starting point to count hours from.
+------------+-------------------------+-------------------------+
| Date | WorkingStartTime | WorkingEndTime |
+------------+-------------------------+-------------------------+
| 2019-09-06 | 2019-09-06 15:00:00.000 | 2019-09-06 16:00:00.000 |
| 2019-09-09 | 2019-09-09 08:00:00.000 | 2019-09-09 16:00:00.000 |
| 2019-09-10 | 2019-09-10 08:00:00.000 | 2019-09-10 16:00:00.000 |
| 2019-09-11 | 2019-09-11 08:00:00.000 | 2019-09-11 16:00:00.000 |
| 2019-09-12 | 2019-09-12 08:00:00.000 | 2019-09-12 16:00:00.000 |
| 2019-09-13 | 2019-09-13 08:00:00.000 | 2019-09-13 16:00:00.000 |
| 2019-09-16 | 2019-09-16 08:00:00.000 | 2019-09-16 16:00:00.000 |
+------------+-------------------------+-------------------------+
The second CTE RollingBusinessSum is calculating the amount of business hours on each day and accumulating them over the days. The last column PendingHours is the result of the amount of hours we need to add from now subtracted by the sum of business hours over the days.
+------------+-------------------------+-------------------------+---------------------+-------------------------+--------------+
| Date | WorkingStartTime | WorkingEndTime | AmountBusinessHours | RollingBusinessHoursSum | PendingHours |
+------------+-------------------------+-------------------------+---------------------+-------------------------+--------------+
| 2019-09-06 | 2019-09-06 15:00:00.000 | 2019-09-06 16:00:00.000 | 1 | 1 | 1 |
| 2019-09-09 | 2019-09-09 08:00:00.000 | 2019-09-09 16:00:00.000 | 8 | 9 | -7 |
| 2019-09-10 | 2019-09-10 08:00:00.000 | 2019-09-10 16:00:00.000 | 8 | 17 | -15 |
| 2019-09-11 | 2019-09-11 08:00:00.000 | 2019-09-11 16:00:00.000 | 8 | 25 | -23 |
| 2019-09-12 | 2019-09-12 08:00:00.000 | 2019-09-12 16:00:00.000 | 8 | 33 | -31 |
| 2019-09-13 | 2019-09-13 08:00:00.000 | 2019-09-13 16:00:00.000 | 8 | 41 | -39 |
+------------+-------------------------+-------------------------+---------------------+-------------------------+--------------+
Finally the first day that the PendingHours column is negative is the day that we arrived at the amount of hours we wanted to add. This is the TOP 1 with ORDER BY. To get the final datetime, we just subtract the pending hours to the end time for that particular day.
+------------+-------------------------+-------------------------+---------------------+-------------------------+--------------+-------------------------+
| Date | WorkingStartTime | WorkingEndTime | AmountBusinessHours | RollingBusinessHoursSum | PendingHours | EndingHour |
+------------+-------------------------+-------------------------+---------------------+-------------------------+--------------+-------------------------+
| 2019-09-09 | 2019-09-09 08:00:00.000 | 2019-09-09 16:00:00.000 | 8 | 9 | -7 | 2019-09-09 09:00:00.000 |
+------------+-------------------------+-------------------------+---------------------+-------------------------+--------------+-------------------------+
You might have to tweak performance and do boundary tests but this might give you a flexible idea of how to work with working hours at holidays or different time periods.
In Postgres below query is working using generate_series function
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
Below query is also working in Oracle but only for date interval
select to_date('2019-03-01','YYYY-MM-DD') + rownum -1 as dates
from all_objects
where rownum <= to_date('2019-03-06','YYYY-MM-DD')-to_date('2019-03-01','YYYY-MM-DD')+1
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
I want same result in Oracle for below query
SELECT dates
FROM generate_series(CAST('2019-03-01' as TIMESTAMP), CAST('2019-04-01' as TIMESTAMP), interval '30 mins') AS dates
Use a hierarchical query:
SELECT DATE '2019-03-01' + ( LEVEL - 1 ) * INTERVAL '30' MINUTE AS dates
FROM DUAL
CONNECT BY DATE '2019-03-01' + ( LEVEL - 1 ) * INTERVAL '30' MINUTE <= DATE '2019-04-01';
Output:
| DATES |
| :------------------ |
| 2019-03-01 00:00:00 |
| 2019-03-01 00:30:00 |
| 2019-03-01 01:00:00 |
| 2019-03-01 01:30:00 |
| 2019-03-01 02:00:00 |
| 2019-03-01 02:30:00 |
| 2019-03-01 03:00:00 |
| 2019-03-01 03:30:00 |
| 2019-03-01 04:00:00 |
| 2019-03-01 04:30:00 |
| 2019-03-01 05:00:00 |
| 2019-03-01 05:30:00 |
...
| 2019-03-31 19:30:00 |
| 2019-03-31 20:00:00 |
| 2019-03-31 20:30:00 |
| 2019-03-31 21:00:00 |
| 2019-03-31 21:30:00 |
| 2019-03-31 22:00:00 |
| 2019-03-31 22:30:00 |
| 2019-03-31 23:00:00 |
| 2019-03-31 23:30:00 |
| 2019-04-01 00:00:00 |
db<>fiddle here
If I want to select a date relative to today's date I can do something like:
DateAdd(month, -2, N'1-Jan-2019')
This will give me the 1st of November 2018.
How would I get the Date of the 1st of September, from the previous year?
E.G
Say it's July 2019,
I want the 1st of September 2018, NOT 2019.
However,
Say it's November 2019,
I want the 1st of September 2019, NOT 2018.
How is this possible?
You can do this by subtracting 8 months from your date value and then using the resulting year to build up your September date:
declare #d table(d date);
insert into #d values ('20170101'),('20180101'),('20181101'),('20190101'),('20191001'),('20190901'),('20190921'),('20190808');
select d
,datefromparts(year(dateadd(month,-8,d)),9,1) as PrevSeptDate
,datetimefromparts(year(dateadd(month,-8,d)),9,1,0,0,0,0) as PrevSeptDateTime
from #d
order by d;
Output
+------------+--------------+-------------------------+
| d | PrevSeptDate | PrevSeptDateTime |
+------------+--------------+-------------------------+
| 2017-01-01 | 2016-09-01 | 2016-09-01 00:00:00.000 |
| 2018-01-01 | 2017-09-01 | 2017-09-01 00:00:00.000 |
| 2018-11-01 | 2018-09-01 | 2018-09-01 00:00:00.000 |
| 2019-01-01 | 2018-09-01 | 2018-09-01 00:00:00.000 |
| 2019-08-08 | 2018-09-01 | 2018-09-01 00:00:00.000 |
| 2019-09-01 | 2019-09-01 | 2019-09-01 00:00:00.000 |
| 2019-09-21 | 2019-09-01 | 2019-09-01 00:00:00.000 |
| 2019-10-01 | 2019-09-01 | 2019-09-01 00:00:00.000 |
+------------+--------------+-------------------------+