How to implement loops in SQL? - sql

I am trying to calculate a KPI for each patient, the KPI is called "Initial prescription start date(IPST)".
The definition of IPST is if the patient has a negative history of using that particular medication for 60 days before a start date that start date is a IPST.
For example- See screen shot below, for patient with ID=101, I will start with IPST as 4/15/2019 , the difference in days between 4/15/2019 and 4/1/2019 is 14 <60 thus I will change my IPST to 4/1/2019.
Continuing with this iteration IPST for 101 is 3/17/2019 and 102 is 3/18/2018 as shown on the right hand side table.
I tried to build a UDF as below, where I am passing id of a patient and UDF is returning the IPST.
CREATE FUNCTION [Initial_Prescription_Date]
(
#id Uniqueidentifier
)
RETURNS date
AS
BEGIN
{
I am failing to implement this code here
}
I can get a list of Start_dates for a patient from a medication table like this
Select id, start_date from patient_medication
I will have to iterate through this list to get to the IPST for a patient.

I'll answer in order to start a dialog that we can work on.
The issue that I have is the the difference in days for ID = 102 between the last record and the one you've picked as the IPST is 29 days, but the IPST you've picked for 102 is 393 days, is that correct?
You don't need to loop to solve this problem. If you're comparing all of your dates only to your most recent, you can simply use MAX:
DECLARE #PatientRecords TABLE
(
ID INTEGER,
StartDate DATE,
Medicine VARCHAR(100)
)
INSERT INTO #PatientRecords VALUES
(101,'20181201','XYZ'),
(101,'20190115','XYZ'),
(101,'20190317','XYZ'),
(101,'20190401','XYZ'),
(101,'20190415','XYZ'),
(102,'20190401','XYZ'),
(102,'20190415','XYZ'),
(102,'20190315','XYZ'),
(102,'20180318','XYZ');
With maxCTE AS
(
SELECT *, DATEDIFF(DAY, StartDate, MAX(StartDate) OVER (PARTITION BY ID, MEDICINE ORDER BY StartDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) [IPSTDateDifference]
FROM #PatientRecords
)
SELECT m.ID, m.Medicine, MIN(m.StartDate) [IPST]
FROM maxCTE m
WHERE [IPSTDateDifference] < 60
GROUP BY m.ID, m.Medicine
ORDER BY 1,3;

Related

SQL Rolling Summary Statistics For Set Timeframe

I have a table that contains information about log-in events. Every time a user logs in, a record is added containing the user and the date. I want to calculate a new column in that table that holds the number of times that user has logged in in the past 31 days (including the current attempt). This is a simplified version of what my table looks like, including the column I want to add:
UserID Date LoginsInPast31Days
-------- ------------- --------------------
1 01-01-2012 1
2 02-01-2012 1
2 10-01-2012 2
1 25-01-2012 2
2 03-02-2012 2
2 22-03-2012 1
I know how to calculate a total amount of login attempts: I'd use COUNT(*) OVER (PARTITION BY UserId ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). However, I want to limit the timeframe to the last 31 days. My guess is that I have to change the UNBOUNDED PRECEDING, but how do I alter it in such a way that it select the right amount of rows?
One pretty efficient way is to add a record 30 days after each date. It looks like this:
select userid, dte,
sum(inc) over (partition by userid order by dte) as LoginsInPast31Days
from ((select distinct userid, logindate as dte, 1 as inc from logins) union all
(select distinct userid, dateadd(day, 31, dte, -1 as inc from logins)
) l;
You're almost there, 2 adjustments:
First make sure to group by user and date so you know how many rows to select
Secondly, you'll need to use 'ROWS BETWEEN CURRENT ROW AND 31 FOLLOWING' since you cannot limit the number of preceding records to use. By using descending sort order, you'll get the required result.
Combine these tips and you'll get:
SELECT SUM(COUNT(*)) OVER (
PARTITION BY t.userid_KEY
ORDER BY CAST(t.login_ts AS DATE) DESC
ROWS BETWEEN CURRENT ROW AND 31 FOLLOWING
)
FROM table AS t
GROUP BY t.userid, CAST(t.login_ts AS DATE)

Smoothing out a result set by date

Using SQL I need to return a smooth set of results (i.e. one per day) from a dataset that contains 0-N records per day.
The result per day should be the most recent previous value even if that is not from the same day. For example:
Starting data:
Date: Time: Value
19/3/2014 10:01 5
19/3/2014 11:08 3
19/3/2014 17:19 6
20/3/2014 09:11 4
22/3/2014 14:01 5
Required output:
Date: Value
19/3/2014 6
20/3/2014 4
21/3/2014 4
22/3/2014 5
First you need to complete the date range and fill in the missing dates (21/3/2014 in you example). This can be done by either joining a calendar table if you have one, or by using a recursive common table expression to generate the complete sequence on the fly.
When you have the complete sequence of dates finding the max value for the date, or from the latest previous non-null row becomes easy. In this query I use a correlated subquery to do it.
with cte as (
select min(date) date, max(date) max_date from your_table
union all
select dateadd(day, 1, date) date, max_date
from cte
where date < max_date
)
select
c.date,
(
select top 1 max(value) from your_table
where date <= c.date group by date order by date desc
) value
from cte c
order by c.date;
May be this works but try and let me know
select date, value from test where (time,date) in (select max(time),date from test group by date);

Finding most recent date based on consecutive dates

I have s table that lists absences(holidays) of all employees, and what we would like to find out is who is away today, and the date that they will return.
Unfortunately, absences aren't given IDs, so you can't just retrieve the max date from an absence ID if one of those dates is today.
However, absences are given an incrementing ID per day as they are inputt, so I need a query that will find the employeeID if there is an entry with today's date, then increment the AbsenceID column to find the max date on that absence.
Table Example (assuming today's date is 11/11/2014, UK format):
AbsenceID EmployeeID AbsenceDate
100 10 11/11/2014
101 10 12/11/2014
102 10 13/11/2014
103 10 14/11/2014
104 10 15/11/2014
107 21 11/11/2014
108 21 12/11/2014
120 05 11/11/2014
130 15 20/11/2014
140 10 01/03/2015
141 10 02/03/2015
142 10 03/03/2015
143 10 04/03/2015
So, from the above, we'd want the return dates to be:
EmployeeID ReturnDate
10 15/11/2014
21 12/11/2014
05 11/11/2014
Edit: note that the 140-143 range couldn't be included in the results as they appears in the future, and none of the date range of the absence are today.
Presumably I need an iterative sub-function running on each entry with today's date where the employeeID matches.
So based on what I believe you're asking, you want to return a list of the people that are off today and when they are expected back based on the holidays that you have recorded in the system, which should only work only on consecutive days.
SQL Fiddle Demo
Schema Setup:
CREATE TABLE EmployeeAbsence
([AbsenceID] int, [EmployeeID] int, [AbsenceDate] DATETIME)
;
INSERT INTO EmployeeAbsence
([AbsenceID], [EmployeeID], [AbsenceDate])
VALUES
(100, 10, '2014-11-11'),
(101, 10, '2014-11-12'),
(102, 10, '2014-11-13'),
(103, 10, '2014-11-14'),
(104, 10, '2014-11-15'),
(107, 21, '2014-11-11'),
(108, 21, '2014-11-12'),
(120, 05, '2014-11-11'),
(130, 15, '2014-11-20')
;
Recursive CTE to generate the output:
;WITH cte AS (
SELECT EmployeeID, AbsenceDate
FROM dbo.EmployeeAbsence
WHERE AbsenceDate = CAST(GETDATE() AS DATE)
UNION ALL
SELECT e.EmployeeID, e.AbsenceDate
FROM cte
INNER JOIN dbo.EmployeeAbsence e ON e.EmployeeID = cte.EmployeeID
AND e.AbsenceDate = DATEADD(d,1,cte.AbsenceDate)
)
SELECT cte.EmployeeID, MAX(cte.AbsenceDate)
FROM cte
GROUP BY cte.EmployeeID
Results:
| EMPLOYEEID | Return Date |
|------------|---------------------------------|
| 5 | November, 11 2014 00:00:00+0000 |
| 10 | November, 15 2014 00:00:00+0000 |
| 21 | November, 12 2014 00:00:00+0000 |
Explanation:
The first SELECT in the CTE gets employees that are off today with this filter:
WHERE AbsenceDate = CAST(GETDATE() AS DATE)
This result set is then UNIONED back to the EmployeeAbsence table with a join that matches EmployeeID as well as the AbsenceDate + 1 day to find the consecutive days recursively using:
-- add a day to the cte.AbsenceDate from the first SELECT
e.AbsenceDate = DATEADD(d,1,cte.AbsenceDate)
The final SELECT simply groups the cte results by employee with the MAX AbsenceDate that has been calculated per employee.
SELECT cte.EmployeeID, MAX(cte.AbsenceDate)
FROM cte
GROUP BY cte.EmployeeID
Excluding Weekends:
I've done a quick test based on your comment and the below modification to the INNER JOIN within the CTE should exclude weekends when adding the extra days if it detects that adding a day will result in a Saturday:
INNER JOIN dbo.EmployeeAbsence e ON e.EmployeeID = cte.EmployeeID
AND e.AbsenceDate = CASE WHEN datepart(dw,DATEADD(d,1,cte.AbsenceDate)) = 7
THEN DATEADD(d,3,cte.AbsenceDate)
ELSE DATEADD(d,1,cte.AbsenceDate) END
So when you add a day: datepart(dw,DATEADD(d,1,cte.AbsenceDate)) = 7, if it results in Saturday (7), then you add 3 days instead of 1 to get Monday: DATEADD(d,3,cte.AbsenceDate).
You'd need to do a few things to get this data into a usable format. You need to be able to work out where a group begins and ends. This is difficult with this example because there is no straight forward grouping column.
So that we can calculate when a group starts and ends, you need to create a CTE containing all the columns and also use LAG() to get the AbsenceID and EmployeeID from the previous row for each row. In this CTE you should also use ROW_NUMBER() at the same time so that we have a way to re-order the rows into the same order again.
Something like:
WITH
[AbsenceStage] AS (
SELECT [AbsenceID], [EmployeeID], [AbsenceDate]
,[RN] = ROW_NUMBER() OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
,[AbsenceID_Prev] = LAG([AbsenceID]) OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
,[EmployeeID_Prev] = LAG([EmployeeID]) OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
FROM [HR_Absence]
)
Now that we have this we can compare each row to the previous to see if the current row is in a different "group" to the previous row.
The condition would be something like:
[EmployeeID_Prev] IS NULL -- We have a new group if the previous row is null
OR [EmployeeID_Prev] <> [EmployeeID] -- Or if the previous row is for a different employee
OR [AbsenceID_Prev] <> ([AbsenceID]-1) -- Or if the AbsenceID is not sequential
You can then use this to join the CTE to it's self to find the first row in each group with something like:
....
FROM [AbsenceStage] AS [Row]
INNER JOIN [AbsenceStage] AS [First]
ON ([First].[RN] = (
-- Get the first row before ([RN] Less that or equal to) this one where it is the start of a grouping
SELECT MAX([RN]) FROM [AbsenceStage]
WHERE [RN] <= [Row].[RN] AND (
[EmployeeID_Prev] IS NULL
OR [EmployeeID_Prev] <> [EmployeeID]
OR [AbsenceID_Prev] <> ([AbsenceID]-1)
)
))
...
You can then GROUP BY the [First].[RN] which will now act like a group id and allow you to get the start and end date of each absence group.
SELECT
[Row].[EmployeeID]
,MIN([Row].[AbsenceDate]) AS [Absence_Begin]
,MAX([Row].[AbsenceDate]) AS [Absence_End]
...
-- FROM and INNER JOIN from above
...
GROUP BY [First].[RN], [Row].[EmployeeID];
You could then put all that into a view giving you the EmployeeID with the Start and End date of each absence. You can then easily pull out the Employee's currently off with a:
WHERE CAST(CURRENT_TIMESTAMP AS date) BETWEEN [Absence_Begin] AND [Absence_End]
SQL Fiddle
Like another answer here, I'm going to create the leave intervals, but via a different method. First the code:
declare #today date = getdate(); --use whatever date here
with g as (
select *, dateadd(day, -1 * row_number() over (partition by employeeid order by absencedate), AbsenceDate) as group_number
from employeeabsence
) , leave_intervals as (
select employeeid, min(absencedate) as [start], max(absencedate) as [end]
from g
group by EmployeeID, group_number
)
select employeeid, [start], [end]
from leave_intervals
where #today between [start] and [end]
By way of explanation, we first put a date value into a variable. I chose today, but this code will work for any date passed in. Next, we create a common table expression (CTE) that will add on a grouping column to your table. This is the meat of the solution, so it bears some treatment. Within a given interval, the AbsenceDate increases at a rate of one day per row. row_number() also increases at a rate of one per row. So, if we subtract a row_number() number of days from the AbsenceDate, we'll get another (arbitrary) date. The key here is to realize that that arbitrary date will be the same for every row in the interval, so we can use it to group by. From there, it's just a matter of doing just that; get the min and max per interval. Lastly, we find what intervals contain #today.

Get name of person having activity in every month - Oracle SQL

I have log table where there is are records with user id and the date for a specific activity done. I want to get names of users having activity every month. I am using the following query
select distinct(employeeid) from transactions
where eventdate between '01-OCT-13' AND '23-OCT-13'
and eventdate between '01-SEP-13' AND '01-OCT-13'
and eventdate between '01-AUG-13' AND '01-SEP-13'
and eventdate between '01-JUL-13' AND '01-AUG-13';
But this is doesn't work. Can someone please suggest any improvement?
Edit:
Since my questions seems to be a little confusing, here is an example
EmployeeID | Timestamp
a | 01-Jul-13
b | 01-Jul-13
a | 01-Aug-13
c | 01-Aug-13
a | 01-Sep-13
d | 01-Sep-13
a | 01-Oct-13
a | 01-Oct-13
In the above table, we can see that employee "a" has activity in all the months from July till October. So I want to find a list of all such employees.
You can use COUNT as analytical function and get the number of months for each employee and total number of months. Then select only those employees where both counts match.
select distinct employeeid
from (
select employeeid,
count(distinct trunc(eventdate,'month')) --count of months for each employee
over (partition by employeeid) as emp_months,
count(distinct trunc(eventdate,'month')) --count of all months
over () as all_months,
from transactions
)
where emp_months = all_months;
Wish I could give you the code, but i'm in a bit of a hurry, so this is more of a suggestion.
Have you tried extracting the distinct months (from eventdate), for every user, and if that has 10 rows (assuming it is October, you could dynamically change this), then the employee must of had an event every month.
By very inefficient, I think you mean it doesn't work. The same value can't be both in september, in october, etc.
Anyway, using #LaggKing suggestion, you could try this query:
SELECT employeeid
FROM (
SELECT DISTINCT employeeid, MONTH(eventdate)
FROM transactions
)
HAVING COUNT(*) = MONTH(NOW())
EDIT: You need to take year into account.
SELECT employeeid
FROM (
SELECT DISTINCT employeeid, MONTH(eventdate)
FROM transactions
WHERE YEAR(eventdate) = YEAR(NOW())
)
HAVING COUNT(*) = MONTH(NOW())

Selecting a date in the future

I am working on a shipment delivery report to determine if shipments are made within a shipment window.
Every release has a Ship_Date value that is the date that the release must ship. Some releases though have a late window integer value that says if the shipment is made within X number of days then it is still considered on time.
This is complicated by another table which holds valid ship days for the month (used to exclude holidays, weekends, and such).
Order_Releases_Table
Part_No,
Quantity,
Ship_Date,
Window
Shipping_Date
Shipping_Day
Sample Data
Order_Releases_Table
Part_No Quantity Ship_Date Window
ABC 100 9/1/2011 0
XYZ 200 9/1/2011 2
Shipping_Date
9/1/2011
9/2/2011
9/5/2011
So with this data part ABC has to ship on 9/1 to be considered on time. Part XYZ though can ship up to 2 days past 9/1 and still be considered on time, but since 9/3 isn't in our shipping days, then 9/5 is the last day it can ship and still be considered on time.
I think the answer lies in joining in a sub query of the shipping days table that assigns a row number to the shipping_day field.
SELECT
Row_Number() OVER(ORDER BY Shipping_Date) AS Day_No,
Shipping_day
FROM Shipping_Date
WHERE Shipping_Day > Ship_Date
RETURNS
Day_No Shipping_Day
1 9/2/2011
2 9/5/2011
Then if I simply pick up the date where the Day_No from this sub query is equal to window value from the release, I then have the last day a particular shipment can ship and still be considered on time.
I'm having a hard time wrapping it all up in to the final query though.
Is this the correct way to approach the problem?
Maybe this will get you started:
DECLARE #t TABLE (Part CHAR(3), ShipDate DATETIME, Window INT)
DECLARE #ship TABLE (ShipDate DATETIME)
INSERT INTO #t
( Part, ShipDate, Window )
SELECT 'abc', '20110901', 0
UNION
SELECT 'xyz', '20110901', 2
INSERT INTO #ship
( ShipDate )
SELECT '20110901'
UNION
SELECT '20110905'
UNION
SELECT '20110910'
SELECT Part, ShipDate, Window,
(SELECT MIN(ShipDate) AS NextShip
FROM #ship S
WHERE s.shipDate >= DATEADD(day, t.Window, t.shipDate))
FROM #t t