Get Start and End date from multiple rows of dates, excluding weekends - sql

I'm trying figure out how to return Start Date and End date based on data like in the below table:
Name
Date From
Date To
A
2022-01-03
2022-01-03
A
2021-12-29
2021-12-31
A
2021-12-28
2021-12-28
A
2021-12-27
2021-12-27
A
2021-12-23
2021-12-24
A
2021-11-08
2021-11-09
The result I am after would show like this:
Name
Date From
Date To
A
2021-12-23
2022-01-03
A
2021-11-08
2021-11-09
The dates in first table will sometimes go over weekends with the Date From and Date To, but in cases where the row ends on a Friday and next row starts on following Monday it will need to be classified as the same "block", as presented in the second table. I was hoping to use DATEFIRST setting to cater for the weekends to avoid using a calendar table, as per How do I exclude Weekend days in a SQL Server query?, but if calendar table ends up being the easiest way out I'm happy to look into creating one.
In above example I only have 1 Name, but the table will have multiple names and it will need to be grouped by that.
The only examples of this I am seeing are using only 1 date column for records and I struggled changing their code around to cater for my example. The closest example I found doesn't work for me as it is based on datetime fields and the time differences - find start and stop date for contiguous dates in multiple rows

This is a Gaps & Island problem with the twist that you need to consider weekend continuity.
You can do:
select max(name) as name, min(date_from) as date_from, max(date_to) as date_to
from (
select *, sum(inc) over(order by date_to) as grp
from (
select *,
case when lag(ext_to) over(order by date_to) = date_from
then 0 else 1 end as inc
from (
select *,
case when (datepart(weekday, date_to) = 6)
then dateadd(day, 3, date_to)
else dateadd(day, 1, date_to) end as ext_to
from t
) x
) y
) z
group by grp
Result:
name date_from date_to
---- ---------- ----------
A 2021-11-08 2021-11-09
A 2021-12-23 2022-01-03
See running example at db<>fiddle #1.
Note: Your question doesn't mention it, but you probably want to segment per person. I didn't do it.
EDIT: Adding partition by name
Partitioning by name is quite easy actually. The following query does it:
select name, min(date_from) as date_from, max(date_to) as date_to
from (
select *, sum(inc) over(partition by name order by date_to) as grp
from (
select *,
case when lag(ext_to) over(partition by name order by date_to) = date_from
then 0 else 1 end as inc
from (
select *,
case when (datepart(weekday, date_to) = 6)
then dateadd(day, 3, date_to)
else dateadd(day, 1, date_to) end as ext_to
from t
) x
) y
) z
group by name, grp
order by name, grp
See running query at db<>fiddle #2.

with extended as (
select name,
date_from,
case when datepart(weekday, date_to) = 6
then dateadd(day, 2, date_to) else date_to end as date_to
from t
), adjacent as (
select *,
case when dateadd(day, 1,
lag(date_to) over (partition by name order by date_from)) = date_from
then 0 else 1 end as brk
from extended
), blocked as (
select *, sum(brk) over (partition by name order by date_from) as grp
from adjacent
)
select name, min(date_from), max(date_to) from blocked
group by name, grp;
I'm assuming that ranges do no overlap and that all input dates do fall on weekdays. While hammering this out on my cellphone I originally made two mistakes. For some reason I got to and from dates reversed in my head and then I was thinking that Friday is 5 (as with ##datefirst) rather than 6. (Of course this could otherwise vary with the regional setting anyway.) One advantage of using table expressions is to modularize and bury certain details in lower levels of the logic. In this case it would be very easy to adjust dates should some of these assumptions prove to be wrong.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=42e0c452d57d474232bcf991d6d3c43c

Related

How to differentiate iteration using date filed in bigquery

I have a process that occur every 30 days but can take few days.
How can I differentiate between each iteration in order to sum the output of the process?
for Example
the output I except is
Name
Date
amount
iteration (optional)
Sophia Liu
2016-01-01
4
1
Sophia Liu
2016-02-01
5
2
Nikki Leith
2016-01-02
5
1
Nikki Leith
2016-02-01
10
2
I tried using lag function on the date filed and using the difference between that column and the date column.
WITH base AS
(SELECT 'Sophia Liu' as name, DATE '2016-01-01' as date, 3 as amount
UNION ALL SELECT 'Sophia Liu', DATE '2016-01-02', 1
UNION ALL SELECT 'Sophia Liu', DATE '2016-02-01', 3
UNION ALL SELECT 'Sophia Liu', DATE '2016-02-02', 2
UNION ALL SELECT 'Nikki Leith', DATE '2016-01-02', 5
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-01', 5
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-02', 3
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-03', 1
UNION ALL SELECT 'Nikki Leith', DATE '2016-02-04', 1)
select
name
,date
,lag(date) over (partition by name order by date) as lag_func
,date_diff(date,lag(date) over (partition by name order by date),day) date_differacne
,case when date_diff(date,lag(date) over (partition by name order by date),day) >= 10
or date_diff(date,lag(date) over (partition by name order by date),day) is null then true else false end as new_iteration
,amount
from base
Edited answer
After your clarification and looking at what's actually in your SQL code. I'm guessing you are looking for a solution to what's called a gaps and islands problem. That is, you want to identify the "islands" of activity and sum the amount for each iteration or island. Taking your example you can first identify the start of a new session (or "gap") and then use that to create a unique iteration ("island") identifier for each user. You can then use that identifier to perform a SUM().
gaps as (
select
name,
date,
amount,
if(date_diff(date, lag(date,1) over(partition by name order by date), DAY) >= 10, 1, 0) new_iteration
from base
),
islands as (
select
*,
1 + sum(new_iteration) over(partition by name order by date) iteration_id
from gaps
)
select
*,
sum(amount) over(partition by name, iteration_id) iteration_amount
from islands
Previous answer
Sounds like you just need a RANK() to count the iterations in your window functions. Depending on your need you can then sum cumulative or total amounts in a similar window function. Something like this:
select
name
,date
,rank() over (partition by name order by date) as iteration
,sum(amount) over (partition by name order by date) as cumulative_amount
,sum(amount) over (partition by name) as total_amount
,amount
from base

How can I find dates ranges with no data from an effective dated table with SQL?

This is a little bit confusing so i'll try to clarify.
let's say I have an employee table like this
employee
eff_Dt
end_effective_date
1
1900-01-01
2020-12-31
1
2021-01-01
2021-02-01
1
2021-02-02
9999-01-01
2
1900-01-01
9999-01-01
3
1900-01-01
2015-12-31
3
2016-01-01
2020-01-01
4
1900-01-01
2016-01-01
4
2018-01-01
9999-01-01
Employees 1 and 2 are fine. They have a full effective dated history from 1900-01-01 to 9999-12-31. All of my employee records need that.
The SQL I need is to find records like 3 and 4. In the case of employee 3, we are missing the data from 2020-01-02 to 9999-01-01 and for employee 4 we are missing data from 2016-01-02 to 2017-12-31.
How can I develop a query that will return these records? I am on Oracle SQL - would prefer an ANSI SQL solution if possible but if the best solution is uses oracle specific functions than it is what it is. I do not have access to create indices or create stored procedures. This can only be done via query.
Thank you in advance.
I'd suggest to count days. The 2958098 is amount of days between 1900-01-01 and 9999-01-01.
This query will return employees 3 and 4
select employee, sum(end_effective_date - eff_dt)
from test
group by employee
having sum(end_effective_date - eff_dt) < 2958098;
UPD: Same query without hard-coded values
select employee, sum(end_effective_date - eff_dt)
from test
group by employee
having sum(end_effective_date - eff_dt) < (date'9999-01-01' - date'1900-01-01' + 1);
If you want the employees missing dates:
select employee
from t
group by employee
having min(eff_Dt) <> date '1900-01-01' or
max(end_effective_date) <> date '9999-01-01';
If you want the specific missing time periods, use lead() for most of them . . . and then union all to get the first one:
select employee,
end_effective_date + interval '1' day as missing_eff_dt,
next_eff_dt - interval '1' day as missing_end_dt
from (select t.*, lead(eff_dt) as next_eff_dt
from t
) t
where next_eff_dt > end_effective_date + interval '1' day
union all
select employee, date '1900-01-01',
min(eff_dt) - interval '1' day
from t
group by employee
where min(eff_dt) > date '1900-01-01'
If I got it well, you need to find records for employees who have gaps between end dates and start dates, and those who don't have 9999-01-01 as the max end_date. The query below will work for that purpose.
select EMPLOYEE, EFF_DT, END_EFFECTIVE_DATE
from (
select tt.*
, count(distinct grp)
over(partition by EMPLOYEE) cnt
, max(END_EFFECTIVE_DATE)
over(partition by EMPLOYEE order by EFF_DT desc) max_END_EFFECTIVE_DATE
from (
select t.*
, case
when EFF_DT != nvl(
lag(END_EFFECTIVE_DATE, 1)over(partition by EMPLOYEE order by EFF_DT)
, date '-4712-01-01'
)+ 1
then row_number()over (partition by EMPLOYEE order by EFF_DT)
else null
end
grp
from your_table
) tt
)ttt
where cnt > 1
or max_END_EFFECTIVE_DATE < date '9999-01-01'
;
The query below uses the Tabibitosan method to stitch together the adjacent time periods. The method itself uses the analytic sum() function and standard aggregation; it works almost unchanged in any SQL dialect that supports basic analytic functions.
The output shows only the employees with incomplete data. It shows uninterrupted periods of "effectivity"; if data is "complete", then there should be only one such interval for the employee, from 1 JAN 1900 to 1 JAN 9999. Those are excluded; the output shows the employees with gaps at the beginning, in the middle, and/or the end, and for those employees it shows the interval (or intervals) of "effectivity".
While you didn't request this, the query could be modified easily to show the "missing" periods for each employee (the periods when they were not effective).
with
t (employee, eff_dt, end_effective_date, grp) as (
select employee, eff_dt, end_effective_date,
end_effective_date - sum(end_effective_date + 1 - eff_dt)
over (partition by employee order by eff_dt)
from sample_data
)
select employee, min(eff_dt) as eff_dt, max(end_effective_date) as end_dt
from t
group by employee, grp
having min(eff_dt) != date '1900-01-01'
or max(end_effective_date) != date '9999-01-01'
order by employee, eff_dt
;
EMPLOYEE EFF_DT END_DT
---------- ---------- ----------
3 1900-01-01 2020-01-01
4 1900-01-01 2016-01-01
4 2018-01-01 9999-01-01

Creating a status log from rows of datetimes of status changes

I'm pulling down some data from a remote API to a local SQL Server table, which is formatted like so. (imagine it's sorted by StatusDT descending)
DriverID StatusDT Status
-------- -------- ------
b103 2019-03-05 05:42:52:000 D
b103 2019-03-03 23:45:42.000 SB
b103 2019-03-03 21:49:41.000 ON
What would be the best way to eventually get to a point where I can return a query showing the total amount of time spent in each status on each day for each driver?
Also, it's possible that there could be gaps of a whole day or more between status updates, in which case I'd need a row showing a continuation of the previous status from 00:00:00 to 23:59:59 for each skipped day. So, if I'm looping through this table to populate another with the structure below, the example above would need to wind up looking like this... (again, sorted descending by date)
DriverID StartDT EndDT Status
-------- --------------- -------------- ------
b103 2019-03-05 05:42:52 D
b103 2019-03-05 00:00:00 2019-03-05 05:42:51 SB
b103 2019-03-04 00:00:00 2019-03-04 23:59:59 SB
b103 2019-03-03 23:45:42 2019-03-03 23:59:59 SB
b103 2019-03-03 21:49:41 2019-03-03 23:45:41 ON
Does that make sense?
I wound up dumping the API data to a "work" table and running a cursor over it to add rows to another table, with the starting and ending date/time, but I'm curious if there's another way that might be more efficient.
Thanks very much.
I think this query is what you need. I couldn't test it, however, for syntax errors:
with x as (
select
DriverID,
StatusDT as StartDT,
lead(StatusID) over(partition by DriverID order by StatusDT) as EndDT,
Status
from my_table
)
select -- start & end on the same day
DriverID,
StartDT,
EndDT,
Status
from x
where convert(date, StartDT) = convert(date, EndDT)
or EndDT is null
union all
select -- start & end on different days; first day up to midnight
DriverID,
StartDT,
dateadd(ms, -3, convert(date, EndDT)) as EndDT,
Status
from x
where convert(date, StartDT) <> convert(date, EndDT)
and or EndDT is not null
union all
select -- start & end on different days; next day from midnight
DriverID,
convert(date, EndDT) as StartDT,
EndDT,
Status
from x
where convert(date, StartDT) <> convert(date, EndDT)
and or EndDT is not null
order by StartDT desc
Most of your answer is just using lead():
select driverid, status, statusdt,
lead(statusdt) over (partition by driverid order by statusdt) as enddte
from t;
This does not give the breaks by day. But you can add those. I think the easiest way is to add in the dates (using a recursive CTE) and compute the status at that time. So:
I would do the following:
use a recursive CTE to calculate the dates
"fill in" the statuses and union to the original table
use lead() to get the end date
This looks like:
with day_boundaries as (
select driverid, dateadd(day, 1, convert(min(statusdt) as date) as statusdt, max(statusdt) as finaldt
from t
group by driverid
having datediff(da, min(statusdt), max(statusdt)) > 0
union all
select driverid, dateadd(day, 1, statusdt), finaldt
from day_boundaries
where statusdt < finaldt
),
unioned as (
select driverid, status, statusdt
from t
union all
select db.driverid, s.status, db.statusdt
from day_boundaries db cross apply
(select top (1) status
from t
where t.statusdt < db.statusdt
order by t.statusdt desc
) s
)
select driverid, status, statusdt,
lead(statusdt) over (partition by driverid order by statusdt) as enddte
from unioned;
Note that this does not subtract any seconds from the end date. The end date matches the previous start date. Time is continuous. It makes no sense to have gaps for records that should snugly fit together.

SQL how to write a query that return missing date ranges?

I am trying to figure out how to write a query that looks at certain records and finds missing date ranges between today and 9999-12-31.
My data looks like below:
ID |start_dt |end_dt |prc_or_disc_1
10412 |2018-07-17 00:00:00.000 |2018-07-20 00:00:00.000 |1050.000000
10413 |2018-07-23 00:00:00.000 |2018-07-26 00:00:00.000 |1040.000000
So for this data I would want my query to return:
2018-07-10 | 2018-07-16
2018-07-21 | 2018-07-22
2018-07-27 | 9999-12-31
I'm not really sure where to start. Is this possible?
You can do that using the lag() function in MS SQL (but that is available starting with 2012?).
with myData as
(
select *,
lag(end_dt,1) over (order by start_dt) as lagEnd
from myTable),
myMax as
(
select Max(end_dt) as maxDate from myTable
)
select dateadd(d,1,lagEnd) as StartDate, dateadd(d, -1, start_dt) as EndDate
from myData
where lagEnd is not null and dateadd(d,1,lagEnd) < start_dt
union all
select dateAdd(d,1,maxDate) as StartDate, cast('99991231' as Datetime) as EndDate
from myMax
where maxDate < '99991231';
If lag() is not available in MS SQL 2008, then you can mimic it with row_number() and joining.
select
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then end_dt +1 END as F1,
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then ISNULL(LEAD(start_dt) over (order by ID) - 1, '99991231') END as F2
from t
Working SQLFiddle example is -> Here
FOR 2008 VERSION
SELECT
X.end_dt + 1 as F1,
ISNULL(Y.start_dt-1, '99991231') as F2
FROM t X
LEFT JOIN (
SELECT
*
, (SELECT MAX(ID) FROM t WHERE ID < A.ID) as ID2
FROM t A) Y ON X.ID = Y.ID2
WHERE DATEDIFF(day, X.end_dt, ISNULL(Y.start_dt, '99991231')) > 1
Working SQLFiddle example is -> Here
This should work in 2008, it assumes that ranges in your table do not overlap. It will also eliminate rows where the end_date of the current row is a day before the start date of the next row.
with dtRanges as (
select start_dt, end_dt, row_number() over (order by start_dt) as rownum
from table1
)
select t2.end_dt + 1, coalesce(start_dt_next -1,'99991231')
FROM
( select dr1.start_dt, dr1.end_dt,dr2.start_dt as start_dt_next
from dtRanges dr1
left join dtRanges dr2 on dr2.rownum = dr1.rownum + 1
) t2
where
t2.end_dt + 1 <> coalesce(start_dt_next,'99991231')
http://sqlfiddle.com/#!18/65238/1
SELECT
*
FROM
(
SELECT
end_dt+1 AS start_dt,
LEAD(start_dt-1, 1, '9999-12-31')
OVER (ORDER BY start_dt)
AS end_dt
FROM
yourTable
)
gaps
WHERE
gaps.end_dt >= gaps.start_dt
I would, however, strongly urge you to use end dates that are "exclusive". That is, the range is everything up to but excluding the end_dt.
That way, a range of one day becomes '2018-07-09', '2018-07-10'.
It's really clear that my range is one day long, if you subtract one from the other you get a day.
Also, if you ever change to needing hour granularity or minute granularity you don't need to change your data. It just works. Always. Reliably. Intuitively.
If you search the web you'll find plenty of documentation on why inclusive-start and exclusive-end is a very good idea from a software perspective. (Then, in the query above, you can remove the wonky +1 and -1.)
This solves your case, but provide some sample data if there will ever be overlaps, fringe cases, etc.
Take one day after your end date and 1 day before the next line's start date.
DECLARE # TABLE (ID int, start_dt DATETIME, end_dt DATETIME, prc VARCHAR(100))
INSERT INTO # (id, start_dt, end_dt, prc)
VALUES
(10410, '2018-07-09 00:00:00.00','2018-07-12 00:00:00.000','1025.000000'),
(10412, '2018-07-17 00:00:00.00','2018-07-20 00:00:00.000','1050.000000'),
(10413, '2018-07-23 00:00:00.00','2018-07-26 00:00:00.000','1040.000000')
SELECT DATEADD(DAY, 1, end_dt)
, DATEADD(DAY, -1, LEAD(start_dt, 1, '9999-12-31') OVER(ORDER BY id) )
FROM #
You may want to take a look at this:
http://sqlfiddle.com/#!18/3a224/1
You just have to edit the begin range to today and the end range to 9999-12-31.

To club the rows for week days

I have data like below:
StartDate EndDate Duration
----------
41890 41892 3
41898 41900 3
41906 41907 2
41910 41910 1
StartDate and EndDate are respective ID values for any dates from calendar. I want to calculate the sum of duration for consecutive days. Here I want to include the days which are weekends. E.g. in the above data, let's say 41908 and 41909 are weekends, then my required result set should look like below.
I already have another proc that can return me the next working day, i.e. if I pass 41907 or 41908 or 41909 as DateID in that proc, it will return 41910 as the next working day. Basically I want to check if the DateID returned by my proc when I pass the above EndDateID is same as the next StartDateID from above data, then both the rows should be clubbed. Below is the data I want to get.
ID StartDate EndDate Duration
----------
278457 41890 41892 3
278457 41898 41900 3
278457 41906 41910 3
Please let me know in case the requirement is not clear, I can explain further.
My Date Table is like below:
DateId Date Day
----------
41906 09-04-2014 Thursday
41907 09-05-2014 Friday
41908 09-06-2014 Saturdat
41909 09-07-2014 Sunday
41910 09-08-2014 Monday
Here is the SQL Code for setup:
CREATE TABLE Table1
(
StartDate INT,
EndDate INT,
LeaveDuration INT
)
INSERT INTO Table1
VALUES(41890, 41892, 3),
(41898, 41900, 3),
(41906, 41907, 3),
(41910, 41910, 1)
CREATE TABLE DateTable
(
DateID INT,
Date DATETIME,
Day VARCHAR(20)
)
INSERT INTO DateTable
VALUES(41907, '09-05-2014', 'Friday'),
(41908, '09-06-2014', 'Saturday'),
(41909, '09-07-2014', 'Sunday'),
(41910, '09-08-2014', 'Monday'),
(41911, '09-09-2014', 'Tuesday')
This is rather complicated. Here is an approach using window functions.
First, use the date table to enumerate the dates without weekends (you can also take out holidays if you want). Then, expand the periods into one day per row, by using a non-equijoin.
You can then use a trick to identify sequential days. This trick is to generate a sequential number for each id and subtract it from the sequential number for the dates. This is a constant for sequential days. The final step is simply an aggregation.
The resulting query is something like this:
with d as (
select d.*, row_number() over (order by date) as seqnum
from dates d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(d.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from table t join
d ds
on ds.date between t.startdate and t.enddate
) t
group by t.id, grp;
EDIT:
The following is the version on this SQL Fiddle:
with d as (
select d.*, row_number() over (order by date) as seqnum
from datetable d
where day not in ('Saturday', 'Sunday')
)
select t.id, min(t.date) as startdate, max(t.date) as enddate, sum(duration)
from (select t.*, ds.seqnum, ds.date,
(ds.seqnum - row_number() over (partition by id order by ds.date) ) as grp
from (select t.*, 'abc' as id from table1 t) t join
d ds
on ds.dateid between t.startdate and t.enddate
) t
group by grp;
I believe this is working, but the date table doesn't have all the dates in it.