Finding most recent date based on consecutive dates - sql

I have s table that lists absences(holidays) of all employees, and what we would like to find out is who is away today, and the date that they will return.
Unfortunately, absences aren't given IDs, so you can't just retrieve the max date from an absence ID if one of those dates is today.
However, absences are given an incrementing ID per day as they are inputt, so I need a query that will find the employeeID if there is an entry with today's date, then increment the AbsenceID column to find the max date on that absence.
Table Example (assuming today's date is 11/11/2014, UK format):
AbsenceID EmployeeID AbsenceDate
100 10 11/11/2014
101 10 12/11/2014
102 10 13/11/2014
103 10 14/11/2014
104 10 15/11/2014
107 21 11/11/2014
108 21 12/11/2014
120 05 11/11/2014
130 15 20/11/2014
140 10 01/03/2015
141 10 02/03/2015
142 10 03/03/2015
143 10 04/03/2015
So, from the above, we'd want the return dates to be:
EmployeeID ReturnDate
10 15/11/2014
21 12/11/2014
05 11/11/2014
Edit: note that the 140-143 range couldn't be included in the results as they appears in the future, and none of the date range of the absence are today.
Presumably I need an iterative sub-function running on each entry with today's date where the employeeID matches.

So based on what I believe you're asking, you want to return a list of the people that are off today and when they are expected back based on the holidays that you have recorded in the system, which should only work only on consecutive days.
SQL Fiddle Demo
Schema Setup:
CREATE TABLE EmployeeAbsence
([AbsenceID] int, [EmployeeID] int, [AbsenceDate] DATETIME)
;
INSERT INTO EmployeeAbsence
([AbsenceID], [EmployeeID], [AbsenceDate])
VALUES
(100, 10, '2014-11-11'),
(101, 10, '2014-11-12'),
(102, 10, '2014-11-13'),
(103, 10, '2014-11-14'),
(104, 10, '2014-11-15'),
(107, 21, '2014-11-11'),
(108, 21, '2014-11-12'),
(120, 05, '2014-11-11'),
(130, 15, '2014-11-20')
;
Recursive CTE to generate the output:
;WITH cte AS (
SELECT EmployeeID, AbsenceDate
FROM dbo.EmployeeAbsence
WHERE AbsenceDate = CAST(GETDATE() AS DATE)
UNION ALL
SELECT e.EmployeeID, e.AbsenceDate
FROM cte
INNER JOIN dbo.EmployeeAbsence e ON e.EmployeeID = cte.EmployeeID
AND e.AbsenceDate = DATEADD(d,1,cte.AbsenceDate)
)
SELECT cte.EmployeeID, MAX(cte.AbsenceDate)
FROM cte
GROUP BY cte.EmployeeID
Results:
| EMPLOYEEID | Return Date |
|------------|---------------------------------|
| 5 | November, 11 2014 00:00:00+0000 |
| 10 | November, 15 2014 00:00:00+0000 |
| 21 | November, 12 2014 00:00:00+0000 |
Explanation:
The first SELECT in the CTE gets employees that are off today with this filter:
WHERE AbsenceDate = CAST(GETDATE() AS DATE)
This result set is then UNIONED back to the EmployeeAbsence table with a join that matches EmployeeID as well as the AbsenceDate + 1 day to find the consecutive days recursively using:
-- add a day to the cte.AbsenceDate from the first SELECT
e.AbsenceDate = DATEADD(d,1,cte.AbsenceDate)
The final SELECT simply groups the cte results by employee with the MAX AbsenceDate that has been calculated per employee.
SELECT cte.EmployeeID, MAX(cte.AbsenceDate)
FROM cte
GROUP BY cte.EmployeeID
Excluding Weekends:
I've done a quick test based on your comment and the below modification to the INNER JOIN within the CTE should exclude weekends when adding the extra days if it detects that adding a day will result in a Saturday:
INNER JOIN dbo.EmployeeAbsence e ON e.EmployeeID = cte.EmployeeID
AND e.AbsenceDate = CASE WHEN datepart(dw,DATEADD(d,1,cte.AbsenceDate)) = 7
THEN DATEADD(d,3,cte.AbsenceDate)
ELSE DATEADD(d,1,cte.AbsenceDate) END
So when you add a day: datepart(dw,DATEADD(d,1,cte.AbsenceDate)) = 7, if it results in Saturday (7), then you add 3 days instead of 1 to get Monday: DATEADD(d,3,cte.AbsenceDate).

You'd need to do a few things to get this data into a usable format. You need to be able to work out where a group begins and ends. This is difficult with this example because there is no straight forward grouping column.
So that we can calculate when a group starts and ends, you need to create a CTE containing all the columns and also use LAG() to get the AbsenceID and EmployeeID from the previous row for each row. In this CTE you should also use ROW_NUMBER() at the same time so that we have a way to re-order the rows into the same order again.
Something like:
WITH
[AbsenceStage] AS (
SELECT [AbsenceID], [EmployeeID], [AbsenceDate]
,[RN] = ROW_NUMBER() OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
,[AbsenceID_Prev] = LAG([AbsenceID]) OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
,[EmployeeID_Prev] = LAG([EmployeeID]) OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
FROM [HR_Absence]
)
Now that we have this we can compare each row to the previous to see if the current row is in a different "group" to the previous row.
The condition would be something like:
[EmployeeID_Prev] IS NULL -- We have a new group if the previous row is null
OR [EmployeeID_Prev] <> [EmployeeID] -- Or if the previous row is for a different employee
OR [AbsenceID_Prev] <> ([AbsenceID]-1) -- Or if the AbsenceID is not sequential
You can then use this to join the CTE to it's self to find the first row in each group with something like:
....
FROM [AbsenceStage] AS [Row]
INNER JOIN [AbsenceStage] AS [First]
ON ([First].[RN] = (
-- Get the first row before ([RN] Less that or equal to) this one where it is the start of a grouping
SELECT MAX([RN]) FROM [AbsenceStage]
WHERE [RN] <= [Row].[RN] AND (
[EmployeeID_Prev] IS NULL
OR [EmployeeID_Prev] <> [EmployeeID]
OR [AbsenceID_Prev] <> ([AbsenceID]-1)
)
))
...
You can then GROUP BY the [First].[RN] which will now act like a group id and allow you to get the start and end date of each absence group.
SELECT
[Row].[EmployeeID]
,MIN([Row].[AbsenceDate]) AS [Absence_Begin]
,MAX([Row].[AbsenceDate]) AS [Absence_End]
...
-- FROM and INNER JOIN from above
...
GROUP BY [First].[RN], [Row].[EmployeeID];
You could then put all that into a view giving you the EmployeeID with the Start and End date of each absence. You can then easily pull out the Employee's currently off with a:
WHERE CAST(CURRENT_TIMESTAMP AS date) BETWEEN [Absence_Begin] AND [Absence_End]
SQL Fiddle

Like another answer here, I'm going to create the leave intervals, but via a different method. First the code:
declare #today date = getdate(); --use whatever date here
with g as (
select *, dateadd(day, -1 * row_number() over (partition by employeeid order by absencedate), AbsenceDate) as group_number
from employeeabsence
) , leave_intervals as (
select employeeid, min(absencedate) as [start], max(absencedate) as [end]
from g
group by EmployeeID, group_number
)
select employeeid, [start], [end]
from leave_intervals
where #today between [start] and [end]
By way of explanation, we first put a date value into a variable. I chose today, but this code will work for any date passed in. Next, we create a common table expression (CTE) that will add on a grouping column to your table. This is the meat of the solution, so it bears some treatment. Within a given interval, the AbsenceDate increases at a rate of one day per row. row_number() also increases at a rate of one per row. So, if we subtract a row_number() number of days from the AbsenceDate, we'll get another (arbitrary) date. The key here is to realize that that arbitrary date will be the same for every row in the interval, so we can use it to group by. From there, it's just a matter of doing just that; get the min and max per interval. Lastly, we find what intervals contain #today.

Related

SQL to find sum of total days in a window for a series of changes

Following is the table:
start_date
recorded_date
id
2021-11-10
2021-11-01
1a
2021-11-08
2021-11-02
1a
2021-11-11
2021-11-03
1a
2021-11-10
2021-11-04
1a
2021-11-10
2021-11-05
1a
I need a query to find the total day changes in aggregate for a given id. In this case, it changed from 10th Nov to 8th Nov so 2 days, then again from 8th to 11th Nov so 3 days and again from 11th to 10th for a day, and finally from 10th to 10th, that is 0 days.
In total there is a change of 2+3+1+0 = 6 days for the id - '1a'.
Basically for each change there is a recorded_date, so we arrange that in ascending order and then calculate the aggregate change of days grouped by id. The final result should be like:
id
Agg_Change
1a
6
Is there a way to do this using SQL. I am using vertica database.
Thanks.
you can use window function lead to get the difference between rows and then group by id
select id, sum(daydiff) Agg_Change
from (
select id, abs(datediff(day, start_Date, lead(start_date,1,start_date) over (partition by id order by recorded_date))) as daydiff
from tablename
) t group by id
It's indeed the use of LAG() to get the previous date in an OLAP query, and an outer query getting the absolute date difference, and the sum of it, grouping by id:
WITH
-- your input - don't use in real query ...
indata(start_date,recorded_date,id) AS (
SELECT DATE '2021-11-10',DATE '2021-11-01','1a'
UNION ALL SELECT DATE '2021-11-08',DATE '2021-11-02','1a'
UNION ALL SELECT DATE '2021-11-11',DATE '2021-11-03','1a'
UNION ALL SELECT DATE '2021-11-10',DATE '2021-11-04','1a'
UNION ALL SELECT DATE '2021-11-10',DATE '2021-11-05','1a'
)
-- real query starts here, replace following comma with "WITH" ...
,
w_lag AS (
SELECT
id
, start_date
, LAG(start_date) OVER w AS prevdt
FROM indata
WINDOW w AS (PARTITION BY id ORDER BY recorded_date)
)
SELECT
id
, SUM(ABS(DATEDIFF(DAY,start_date,prevdt))) AS dtdiff
FROM w_lag
GROUP BY id
-- out id | dtdiff
-- out ----+--------
-- out 1a | 6
I was thinking lag function will provide me the answer, but it kept giving me wrong answer because I had the wrong logic in one place. I have the answer I need:
with cte as(
select id, start_date, recorded_date,
row_number() over(partition by id order by recorded_date asc) as idrank,
lag(start_date,1) over(partition by id order by recorded_date asc) as prev
from table_temp
)
select id, sum(abs(date(start_date) - date(prev))) as Agg_Change
from cte
group by 1
If someone has a better solution please let me know.

Selecting the difference between dates in a stored procedure using a subquery

I can't get my head around whether this is even possible, but I feel like I might have done it before and lost that bit of code. I am trying to craft a select statement that contains an inner join on a subquery to show the number of days between two dates from the same table.
A simple example of the data structure would look like:
Name ID Date Day Hours
Bill 1 3/3/20 Thursday 8
Fred 2 4/3/20 Monday 6
Bill 1 8/3/20 Tuesday 2
Based on this data, I want to select each row plus an extra column which is the number of days between the date from each row for each ID. Something like:
Select * from tblData
Inner join (datediff(Select Top(1) Date from tblData where Date < Date), Date) And ID = ID)
or for simplicity:
Select * from tblData
Inner join (datediff(Select Top(1) Date from tblData where Date < 8/3/20), 8/3/20) And ID = 1)
The resulting dataset would look like:
Name ID Date Day Hours DaysBtwn
Bill 1 3/3/20 Thursday 8 4 (Assuming there was an earlier row in the table)
Fred 2 4/3/20 Monday 6 5 (Assuming there was an earlier row in the table)
Bill 1 8/3/20 Tuesday 2 5 (Based on the previous row date being 3/3/20 for Bill)
Does this make sense and am I trying to do this the wrong way? I want to do this for about 600000 rows in table and therefore efficiency is the key, so if there is a better way to do this, i'm open to suggestions.
You can use lag():
select t.*, datediff(day, lag(date) over(partition by id order by date), date) diff
from mytable t
I think you just want lag():
select t.*,
datediff(day,
lag(date) over (partition by name order by date),
date
) as diff
from tblData t;
Note: If you want to filter the data so rows in the result set are used for the lag() but not in the result set, then use a subquery:
select t.*
from (select t.*,
datediff(day,
lag(date) over (partition by name order by date),
date
) as diff
from tblData t
) t
where date < '2020-08-03';
Also note the use of the date constant as a string in YYYY-MM-DD format.

How to implement loops in SQL?

I am trying to calculate a KPI for each patient, the KPI is called "Initial prescription start date(IPST)".
The definition of IPST is if the patient has a negative history of using that particular medication for 60 days before a start date that start date is a IPST.
For example- See screen shot below, for patient with ID=101, I will start with IPST as 4/15/2019 , the difference in days between 4/15/2019 and 4/1/2019 is 14 <60 thus I will change my IPST to 4/1/2019.
Continuing with this iteration IPST for 101 is 3/17/2019 and 102 is 3/18/2018 as shown on the right hand side table.
I tried to build a UDF as below, where I am passing id of a patient and UDF is returning the IPST.
CREATE FUNCTION [Initial_Prescription_Date]
(
#id Uniqueidentifier
)
RETURNS date
AS
BEGIN
{
I am failing to implement this code here
}
I can get a list of Start_dates for a patient from a medication table like this
Select id, start_date from patient_medication
I will have to iterate through this list to get to the IPST for a patient.
I'll answer in order to start a dialog that we can work on.
The issue that I have is the the difference in days for ID = 102 between the last record and the one you've picked as the IPST is 29 days, but the IPST you've picked for 102 is 393 days, is that correct?
You don't need to loop to solve this problem. If you're comparing all of your dates only to your most recent, you can simply use MAX:
DECLARE #PatientRecords TABLE
(
ID INTEGER,
StartDate DATE,
Medicine VARCHAR(100)
)
INSERT INTO #PatientRecords VALUES
(101,'20181201','XYZ'),
(101,'20190115','XYZ'),
(101,'20190317','XYZ'),
(101,'20190401','XYZ'),
(101,'20190415','XYZ'),
(102,'20190401','XYZ'),
(102,'20190415','XYZ'),
(102,'20190315','XYZ'),
(102,'20180318','XYZ');
With maxCTE AS
(
SELECT *, DATEDIFF(DAY, StartDate, MAX(StartDate) OVER (PARTITION BY ID, MEDICINE ORDER BY StartDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) [IPSTDateDifference]
FROM #PatientRecords
)
SELECT m.ID, m.Medicine, MIN(m.StartDate) [IPST]
FROM maxCTE m
WHERE [IPSTDateDifference] < 60
GROUP BY m.ID, m.Medicine
ORDER BY 1,3;

Smoothing out a result set by date

Using SQL I need to return a smooth set of results (i.e. one per day) from a dataset that contains 0-N records per day.
The result per day should be the most recent previous value even if that is not from the same day. For example:
Starting data:
Date: Time: Value
19/3/2014 10:01 5
19/3/2014 11:08 3
19/3/2014 17:19 6
20/3/2014 09:11 4
22/3/2014 14:01 5
Required output:
Date: Value
19/3/2014 6
20/3/2014 4
21/3/2014 4
22/3/2014 5
First you need to complete the date range and fill in the missing dates (21/3/2014 in you example). This can be done by either joining a calendar table if you have one, or by using a recursive common table expression to generate the complete sequence on the fly.
When you have the complete sequence of dates finding the max value for the date, or from the latest previous non-null row becomes easy. In this query I use a correlated subquery to do it.
with cte as (
select min(date) date, max(date) max_date from your_table
union all
select dateadd(day, 1, date) date, max_date
from cte
where date < max_date
)
select
c.date,
(
select top 1 max(value) from your_table
where date <= c.date group by date order by date desc
) value
from cte c
order by c.date;
May be this works but try and let me know
select date, value from test where (time,date) in (select max(time),date from test group by date);

Get name of person having activity in every month - Oracle SQL

I have log table where there is are records with user id and the date for a specific activity done. I want to get names of users having activity every month. I am using the following query
select distinct(employeeid) from transactions
where eventdate between '01-OCT-13' AND '23-OCT-13'
and eventdate between '01-SEP-13' AND '01-OCT-13'
and eventdate between '01-AUG-13' AND '01-SEP-13'
and eventdate between '01-JUL-13' AND '01-AUG-13';
But this is doesn't work. Can someone please suggest any improvement?
Edit:
Since my questions seems to be a little confusing, here is an example
EmployeeID | Timestamp
a | 01-Jul-13
b | 01-Jul-13
a | 01-Aug-13
c | 01-Aug-13
a | 01-Sep-13
d | 01-Sep-13
a | 01-Oct-13
a | 01-Oct-13
In the above table, we can see that employee "a" has activity in all the months from July till October. So I want to find a list of all such employees.
You can use COUNT as analytical function and get the number of months for each employee and total number of months. Then select only those employees where both counts match.
select distinct employeeid
from (
select employeeid,
count(distinct trunc(eventdate,'month')) --count of months for each employee
over (partition by employeeid) as emp_months,
count(distinct trunc(eventdate,'month')) --count of all months
over () as all_months,
from transactions
)
where emp_months = all_months;
Wish I could give you the code, but i'm in a bit of a hurry, so this is more of a suggestion.
Have you tried extracting the distinct months (from eventdate), for every user, and if that has 10 rows (assuming it is October, you could dynamically change this), then the employee must of had an event every month.
By very inefficient, I think you mean it doesn't work. The same value can't be both in september, in october, etc.
Anyway, using #LaggKing suggestion, you could try this query:
SELECT employeeid
FROM (
SELECT DISTINCT employeeid, MONTH(eventdate)
FROM transactions
)
HAVING COUNT(*) = MONTH(NOW())
EDIT: You need to take year into account.
SELECT employeeid
FROM (
SELECT DISTINCT employeeid, MONTH(eventdate)
FROM transactions
WHERE YEAR(eventdate) = YEAR(NOW())
)
HAVING COUNT(*) = MONTH(NOW())