Example dataset.
CLINIC
APPTDATETIME
PATIENT_ID
NEW_FOLLOWUP_FLAG
TGYN
20/07/2022 09:00:00
1
N
TGYN
20/07/2022 09:45:00
2
F
TGYN
20/07/2022 10:05:00
NULL
NULL
TGYN
20/07/2022 10:05:00
4
F
TGYN
20/07/2022 10:25:00
5
F
TGYN
20/07/2022 10:30:00
NULL
NULL
TGYN
20/07/2022 10:35:00
NULL
NULL
TGYN
20/07/2022 10:40:00
NULL
NULL
TGYN
20/07/2022 10:45:00
NULL
NULL
TGYN
20/07/2022 11:10:00
6
F
TGYN
20/07/2022 11:10:00
7
F
As you can see there are times with multiple patients, times with empty slots and times with both (generally DQ errors).
I'm trying to calculate how many slots where filled and how many of those were new (N) or follow up(F). If there is a slot with a patient and also a NULL row then I only want to count the row with the patient. If there are only NULL rows for a timeslot then I want to count that as 'unfilled'.
From this dataset I would like to calculate the following for each group of clinic and apptdatetime.
CLINIC
APPTDATE
N Capacity
F Capacity
Unfilled Capacity
TGYN
20/07/2022
1
5
4
What's the best way to go about this?
I've considered taking a list of distinct values for each clinic and date and then joining to that but wanted to know if there are a more elegant way.
First I set up some demo data in a table from what you provided:
DECLARE #table TABLE (CLINIC NVARCHAR(4), APPTDATETIME DATETIME, PATIENT_ID INT, NEW_FOLLOWUP_FLAG NVARCHAR(1))
INSERT INTO #table (CLINIC, APPTDATETIME, PATIENT_ID, NEW_FOLLOWUP_FLAG) VALUES
('TGYN','07/20/2022 09:00:00', 1 ,'N'),
('TGYN','07/20/2022 09:45:00', 2 ,'F'),
('TGYN','07/20/2022 10:05:00', NULL ,NULL),
('TGYN','07/20/2022 10:05:00', 4 ,'F'),
('TGYN','07/20/2022 10:25:00', 5 ,'F'),
('TGYN','07/20/2022 10:30:00', NULL ,NULL),
('TGYN','07/20/2022 10:35:00', NULL ,NULL),
('TGYN','07/20/2022 10:40:00', NULL ,NULL),
('TGYN','07/20/2022 10:45:00', NULL ,NULL),
('TGYN','07/20/2022 11:10:00', 6 ,'F'),
('TGYN','07/20/2022 11:10:00', 7 ,'F')
Reading through your description it looks like you'd need a couple of case statements and a group by:
SELECT CLINIC, CAST(APPTDATETIME AS DATE) AS APPTDATE,
SUM(CASE WHEN NEW_FOLLOWUP_FLAG = 'N' THEN 1 ELSE 0 END) AS NCapacity,
SUM(CASE WHEN NEW_FOLLOWUP_FLAG = 'F' THEN 1 ELSE 0 END) AS FCapacity,
SUM(CASE WHEN NEW_FOLLOWUP_FLAG IS NULL THEN 1 ELSE 0 END) AS UnfilledCapacity
FROM #table
GROUP BY CLINIC, CAST(APPTDATETIME AS DATE)
Which returns a result set like this:
CLINIC APPTDATE NCapacity FCapacity UnfilledCapacity
------------------------------------------------------------
TGYN 2022-07-20 1 5 5
Note that I cast the datetime column to a date and grouped by that.
The case statements just test for a condition (is the column null, or F or N) and then just returns a 1, which is summed.
Your title also asked about finding duplicates in the data set. You should likely have a constraint on this table making CLINIC and APPTDATETIME forcibly unique. This would prevent rows even being inserted as dupes.
If you want to find them in the table try something like this:
SELECT CLINIC, APPTDATETIME, COUNT(*) AS Cnt
FROM #table
GROUP BY CLINIC, APPTDATETIME
HAVING COUNT(*) > 1
Which from the test data returned:
CLINIC APPTDATETIME Cnt
-----------------------------------
TGYN 2022-07-20 10:05:00.000 2
TGYN 2022-07-20 11:10:00.000 2
Indicating there are dupes for those clinic/datetime combinations.
HAVING is the magic here, we can count them up and state we only want ones which are greater than 1.
This is basically a straight-forward conditional aggregation with group by, with the slight complication of excluding NULL rows where a corresponding appointment also exists.
For this you can include an anti-semi self-join using not exists so as to exclude counting for unfilled capacity any row where there's also valid data for the same date:
select CLINIC, Convert(date, APPTDATETIME) AppDate,
Sum(case when NEW_FOLLOWUP_FLAG = 'N' then 1 end) N_Capacity,
Sum(case when NEW_FOLLOWUP_FLAG = 'f' then 1 end) F_Capacity,
Sum(case when NEW_FOLLOWUP_FLAG is null then 1 end) U_Capacity
from t
where not exists (
select * from t t2
where t.PATIENT_ID is null
and t2.PATIENT_ID is not null
and t.APPTDATETIME = t2.APPTDATETIME
)
group by CLINIC, Convert(date, APPTDATETIME);
Related
My question involves how to identify an index discharge.
The index discharge is the earliest discharge. On that date, the 30 day window starts. Any admissions during that time period are considered readmissions, and they should be ignored. Once the 30 day window is over, then any subsequent discharge is considered an index and the 30 day window begins again.
I can't seem to work out the logic for this. I've tried different windowing functions, I've tried cross joins and cross applies. The issue I keep encountering is that a readmission cannot be an index admission. It must be excluded.
I have successfully written a while loop to solve this problem, but I'd really like to get this in a set based format, if it's possible. I haven't been successful so far.
Ultimate goal is this -
id
AdmitDate
DischargeDate
MedicalRecordNumber
IndexYN
1
2021-03-03 00:00:00.000
2021-03-09 13:20:00.000
X0090362
1
4
2021-03-05 00:00:00.000
2021-03-10 16:00:00.000
X0012614
1
6
2021-05-18 00:00:00.000
2021-05-21 22:20:00.000
X0012614
1
7
2021-06-21 00:00:00.000
2021-07-08 13:30:00.000
X0012614
1
8
2021-02-03 00:00:00.000
2021-02-09 17:00:00.000
X0019655
1
10
2021-03-23 00:00:00.000
2021-03-26 16:40:00.000
X0019655
1
11
2021-03-15 00:00:00.000
2021-03-18 15:53:00.000
X4135958
1
13
2021-05-17 00:00:00.000
2021-05-23 14:55:00.000
X4135958
1
15
2021-06-24 00:00:00.000
2021-07-13 15:06:00.000
X4135958
1
Sample code is below.
CREATE TABLE #Admissions
(
[id] INT,
[AdmitDate] DATETIME,
[DischargeDateTime] DATETIME,
[UnitNumber] VARCHAR(20),
[IndexYN] INT
)
INSERT INTO #Admissions
VALUES( 1 ,'2021-03-03' ,'2021-03-09 13:20:00.000' ,'X0090362', NULL)
,(2 ,'2021-03-27' ,'2021-03-30 19:59:00.000' ,'X0090362', NULL)
,(3 ,'2021-03-31' ,'2021-04-04 05:57:00.000' ,'X0090362', NULL)
,(4 ,'2021-03-05' ,'2021-03-10 16:00:00.000' ,'X0012614', NULL)
,(5 ,'2021-03-28' ,'2021-04-16 13:55:00.000' ,'X0012614', NULL)
,(6 ,'2021-05-18' ,'2021-05-21 22:20:00.000' ,'X0012614', NULL)
,(7 ,'2021-06-21' ,'2021-07-08 13:30:00.000' ,'X0012614', NULL)
,(8 ,'2021-02-03' ,'2021-02-09 17:00:00.000' ,'X0019655', NULL)
,(9 ,'2021-02-17' ,'2021-02-22 17:25:00.000' ,'X0019655', NULL)
,(10 ,'2021-03-23' ,'2021-03-26 16:40:00.000' ,'X0019655', NULL)
,(11 ,'2021-03-15' ,'2021-03-18 15:53:00.000' ,'X4135958', NULL)
,(12 ,'2021-04-08' ,'2021-04-13 19:42:00.000' ,'X4135958', NULL)
,(13 ,'2021-05-17' ,'2021-05-23 14:55:00.000' ,'X4135958', NULL)
,(14 ,'2021-06-09' ,'2021-06-14 12:45:00.000' ,'X4135958', NULL)
,(15 ,'2021-06-24' ,'2021-07-13 15:06:00.000' ,'X4135958', NULL)
You can use a recursive CTE to identify all rows associated with each "index" discharge:
with a as (
select a.*, row_number() over (order by dischargedatetime) as seqnum
from admissions a
),
cte as (
select id, admitdate, dischargedatetime, unitnumber, seqnum, dischargedatetime as index_dischargedatetime
from a
where seqnum = 1
union all
select a.id, a.admitdate, a.dischargedatetime, a.unitnumber, a.seqnum,
(case when a.dischargedatetime > dateadd(day, 30, cte.index_dischargedatetime)
then a.dischargedatetime else cte.index_dischargedatetime
end) as index_dischargedatetime
from cte join
a
on a.seqnum = cte.seqnum + 1
)
select *
from cte;
You can then incorporate this into an update:
update admissions
set indexyn = (case when admissions.dischargedatetime = cte.index_dischargedatetime then 'Y' else 'N' end)
from cte
where cte.id = admissions.id;
Here is a db<>fiddle. Note that I changed the type of IndexYN to a character to assign 'Y'/'N', which makes sense given the column name.
I have a table in BQ that looks like this:
Row Field DateTime
1 one 10:00 AM
2 null 10:05 AM
3 null 10:10 AM
4 one 10:30 AM
5 null 11:00 AM
6 two 11:15 AM
7 two 11:30 AM
8 null 11:35 AM
9 null 11:40 AM
10 null 11:50 AM
11 null 12:00 AM
12 null 12:15 AM
13 two 12:30 AM
14 null 12:15 AM
15 null 12:25 AM
16 null 12:35 AM
17 three 12:55 AM
I want to create another column called prevField and fill it out with the last Field value that is not null, when the first and last entry around the null are the same. When the first and last entry around null are different, it should remain null. The result would look like the following:
Row Field DateTime prevField
1 one 10:00 AM null
2 null 10:05 AM one
3 null 10:10 AM one
4 one 10:30 AM one
5 null 11:00 AM null
6 two 11:15 AM two
7 two 11:30 AM two
8 null 11:35 AM two
9 null 11:40 AM two
10 null 11:50 AM two
11 null 12:00 AM two
12 null 12:15 AM two
13 two 12:30 AM two
14 null 12:15 AM null
15 null 12:15 AM null
16 null 12:15 AM null
17 three 12:15 AM three
So far i tried the following code variations for first part of the question (fill out prevField with the last Field value that is not null, when the first and last entry around the null are the same) but without success.
select Field, Datetime,
(1)--case when FieldName is null then LAG(FieldName) over (order by DateTime) else FieldName end as prevFieldName
(2)--LAST_VALUE(FieldName IGNORE NULLS) OVER (ORDER BY DateTime
(3)--ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS prevFieldName
(4)-- first_value(FieldName)over(order by DateTime) as prevFieldName
from table
EDIT: I added rows to the data and change row numbers
You can use following logic to achieve your goal.
Sample Data creation:
WITH
Base AS
(
SELECT *
FROM(
SELECT 123 Row, 'one' Field, '10:00 AM' DateTime
UNION ALL
SELECT 123, null, '10:05 AM'
UNION ALL
SELECT
123, null, '10:10 AM'
UNION ALL
SELECT
123 , 'one' , '10:30 AM'
UNION ALL
SELECT
456,null,'11:00 AM'
UNION ALL
SELECT
456,'two','11:15 AM'
UNION ALL
SELECT
789,'two','11:30 AM'))
Logic: The query grabs max and min for each field and also the lead and lag values for each row, based on that it determines the prevfield values.
SELECT a.Field,DateTime,
CASE WHEN a.DateTime = a.min_date THEN ''
WHEN a.lag_field IS NOT NULL and a.lead_field IS NULL THEN a.lag_field
WHEN a.lag_field IS NULL and a.lead_field IS NOT NULL THEN a.lead_field
WHEN a.lag_field != a.lead_field THEN a.lag_field
WHEN a.Field IS NOT NULL AND a.lag_field IS NULL AND a.lead_field IS NULL AND a.DateTime = a.Max_date THEN a.Field
ELSE ''
END as prevField
FROM(
SELECT Base.Field,DateTime,LAG(Base.Field) over (order by DateTime)lag_field,Lead(Base.Field) over (order by DateTime) lead_field,min_date,Max_date
From Base LEFT JOIN (SELECT Field,MIN(DateTime) min_date,MAX(DateTime) Max_date FROM Base Group by Field) b
ON Base.Field = b.Field
) a
This query partly solve my problem:
CREATE TEMP FUNCTION ToHex(x INT64) AS (
(SELECT STRING_AGG(FORMAT('%02x', x >> (byte * 8) & 0xff), '' ORDER BY byte DESC)
FROM UNNEST(GENERATE_ARRAY(0, 7)) AS byte)
);
SELECT
DateTime
Field
, SUBSTR(MAX( ToHex(row_n) || Field) OVER (ORDER BY row_n ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 17) AS previous
FROM (
SELECT *, ROW_NUMBER() over (ORDER BY DateTime) AS row_n
FROM `xx.yy.zz`
);
I have a query that looks like this:
SELECT DISTINCT
p.person_ID
,p.Last_Name
,ISNULL(p.Middle_Initial, '') AS Middle
,p.First_Name
,sh.Status_from_date
,sh.Status_thru_date
--(a)
FROM
person p
INNER JOIN
Person_Facilities f ON p.Person_ID = f.Person_ID
LEFT OUTER JOIN
rv_person_status_hist sh ON p.person_ID = sh.person_ID
ORDER BY
Last_Name
The returned data looks like this sort of thing (ignore the 2018 column for now):
Person_id Last_Name Middle First_Name Status_from_date Status_thru_date 2018
8000 Skywalker Dude Luke Null 2010-01-28 07:38 1
9000 Yoda Phinnius 2017-06-01 00:00 2019-05-31 00:00 1
1000 Lamb Little Mary 2018-07-01 00:00 2020-06-30 00:00 1
2000 Spider Bitsy Itsy 2016-11-01 00:00 2017-06-30 00:00 1
How do I add a column, say [2018], and put a 1 for if status_from_date to status_thru_date is in 2018, or a 0 if not?
I wanted to add the following at the --(a) in the query:
,(SELECT case
when exists
(
select * --
FROM dbo.RV_Person_status_hist
where
status_from_date is not null
and
('1-1-2018' between status_from_date and status_thru_date)
and status_from_date is not null
)
then 1 else 0 end )
AS [2018]
This doesn't seem to be working, though. See the 2018 column in the above table. It's showing 1 for all returned, and it's not excluding nulls. It's pretty complicated. status_from and status_thru could fall with 2018 in it, or 2018 could be inside status_from and status_thru, which should both be 1.
How do I exclude the nulls, and how do I show a 1 when the status date includes 2018?
I've looked at range within range, and return 0 or 1. I don't think I have all cases since the ranges overlap as well.
**Update:
I tried adding this at --(a) above instead, per the potential answer below:
,(SELECT status_from_date, status_thru_date,
case
when datepart(year, status_from_date)='2018'
or datepart(year, status_thru_date)='2018'
or (
status_from_date <= '01/01/2018'
and status_thru_date >= '12/12/2018'
)
then 1
else 0
end) AS [2018]
but I'm getting Ambiguous column name 'status_from_date'. Ambiguous
column name 'status_thru_date'. Only one expression can be specified
in the select list when the subquery is not introduced with EXISTS.
Any ideas? Figured it out.
**Update 2: How about this?
,(case when (
(
(sh.status_from_date is null or sh.status_from_date <= '2017-01-01') and
(sh.status_thru_date is null or sh.status_thru_date >= '2017-12-31')
)
or
(
(f.status_from_date is null or f.status_from_date <= '2017-01-01') and
(f.status_thru_date is null or f.status_thru_date >= '2017-12-31')
)
or
(
(datepart(year, sh.status_from_date)='2017') or
(datepart(year, sh.status_thru_date)='2017') or
(datepart(year, f.status_from_date)='2017') or
(datepart(year, f.status_from_date)='2017')
)
and
p.Sex='M'
)
then 1 else 0
end) as [2017_Male]
,(case when (
(
(sh.status_from_date is null or sh.status_from_date <= '2017-01-01') and
(sh.status_thru_date is null or sh.status_thru_date >= '2017-12-31')
)
or
(
(f.status_from_date is null or f.status_from_date <= '2017-01-01') and
(f.status_thru_date is null or f.status_thru_date >= '2017-12-31')
)
or
(
(datepart(year, sh.status_from_date)='2017') or
(datepart(year, sh.status_thru_date)='2017') or
(datepart(year, f.status_from_date)='2017') or
(datepart(year, f.status_from_date)='2017')
)
and
p.Sex='F'
)
then 1 else 0
end) as [2017_Female]--------
That one is putting a 1 in the 2017 column for both male and female for the data of: status_from: 2014-10-01 and status_to: 2016-09-30
You could do something like this:
while would be to check if start or end contains 2018, or if the date is between start and thru
CREATE TABLE #testTable (
Status_from_date DATETIME,
Status_thru_date DATETIME
)
INSERT INTO #testTable (
Status_from_date,
Status_thru_date
)
VALUES (
'2017-06-01 00:00',
'2019-05-31 00:00'
),
(
NULL,
'2010-01-28 07:38'
),
(
'2018-07-01 00:00',
'2020-06-30 00:00'
)
SELECT Status_from_date,
Status_thru_date,
CASE
WHEN datepart(year, Status_from_date) = '2018'
OR datepart(year, Status_thru_date) = '2018'
OR (
Status_from_date <= '01/01/2018'
AND Status_thru_date >= '12/12/2018'
)
THEN 1
ELSE 0
END AS '2018'
FROM #testTable
DROP TABLE #testTable
which produces:
Status_from_date Status_thru_date 2018
2017-06-01 00:00:00.000 2019-05-31 00:00:00.000 1
NULL 2010-01-28 07:38:00.000 0
2018-07-01 00:00:00.000 2020-06-30 00:00:00.000 1
If you want any overlaps in 2018, then:
(case when (status_from_date is null or status_from_date < '2019-01-01') and
(status_to_date is null or status_to_date >= '2018-01-01')
then 1 else 0
end) as is_2018
If you want overlaps of the complete year:
(case when (status_from_date is null or status_from_date <= '2018-01-01') and
(status_to_date is null or status_to_date >= '2018-12-31')
then 1 else 0
end) as is_2018
My table is as below:
id time_stamp Access Type
1001 2017-09-05 09:35:00 IN
1002 2017-09-05 11:00:00 IN
1001 2017-09-05 12:00:00 OUT
1002 2017-09-05 12:25:00 OUT
1001 2017-09-05 13:00:00 IN
1002 2017-09-05 14:00:00 IN
1001 2017-09-05 17:00:00 OUT
1002 2017-09-05 18:00:00 OUT
I have tried this query below:
SELECT ROW_NUMBER() OVER (
ORDER BY A.emp_reader_id ASC
) AS SNo
,B.emp_code
,B.emp_name
,CASE
WHEN F.event_entry_name = 'IN'
THEN A.DT
END AS in_time
,CASE
WHEN F.event_entry_name = 'OUT'
THEN A.DT
END AS out_time
,cast(left(CONVERT(TIME, a.DT), 5) AS VARCHAR) AS 'time'
,isnull(B.areaname, 'OAE6080036073000006') AS areaname
,C.dept_name
,b.emp_reader_id
,isnull(c.dept_name, '') AS group_name
,CONVERT(CHAR(11), '2017/12/30', 103) AS StartDate
,CONVERT(CHAR(11), '2018/01/11', 103) AS ToDate
,0 AS emp_card_no
FROM dbo.trnevents AS A
LEFT OUTER JOIN dbo.employee AS B ON A.emp_reader_id = B.emp_reader_id
LEFT OUTER JOIN dbo.departments AS C ON B.dept_id = C.dept_id
LEFT OUTER JOIN dbo.DevicePersonnelarea AS E ON A.POINTID = E.areaid
LEFT OUTER JOIN dbo.Event_entry AS F ON A.EVENTID = F.event_entry_id
ORDER BY A.emp_reader_id ASC
It works but it takes like below. Sometime have same in event and out event :
SNo emp_code emp_name in_time out_time time areaname dept_name emp_reader_id group_name StartDate ToDate emp_card_no
1 102 Ihsan Titi NULL 2017-12-30 12:16:26.000 12:16 Dubai Sales 102 Sales 2017/12/30 2018/01/11 0
2 102 Ihsan Titi NULL 2017-12-30 12:16:27.000 12:16 Dubai Sales 102 Sales 2017/12/30 2018/01/11 0
3 102 Ihsan Titi 2017-12-30 12:44:26.000 NULL 12:44 Dubai Sales 102 Sales 2017/12/30 2018/01/11 0
4 102 Ihsan Titi 2017-12-30 16:27:48.000 NULL 16:27 Dubai Sales 102 Sales 2017/12/30 2018/01/11 0
Expected output:
SNo emp_code emp_name in_time out_time time areaname dept_name emp_reader_id group_name StartDate ToDate emp_card_no
1 102 Ihsan Titi 2017-12-30 12:16:26.000 2017-12-30 12:44:26.000 12:16 Dubai Sales 102 Sales 2017/12/30 2018/01/11 0
2 102 Ihsan Titi 2017-12-30 12:50:26.000 2017-12-30 16:27:48.000 12:16 Dubai Sales 102 Sales 2017/12/30 2018/01/11 0
kindly help i stuck here to get like this..
you can use this :
select A_In.emp_reader_id as empId,A_In.Belongs_to,A_In.DeviceSerialNumber,
DT as EntryTime,
(
select min(DT) as OutTime
from trnevents A_Out
where EVENTID like 'IN'
and A_Out.emp_reader_id = A_In.emp_reader_id
and A_Out.DT > A_In.DT and DATEDIFF(day,A_In.Dt,A_Out.DT)=0
) as ExitTime from trnevents A_In where EVENTID like 'OUT'
from trnevents A_In
The way I've approached it below is to say that if an event is the same type as the event before it then treat it as a "rogue".
Rogues always sit on their own, never paired with any other event.
All other events get paired such that IN is the first item and OUT is the second item.
Then I can group everything up to reduce pairs down to single rows.
WITH
rogue_check
AS
(
SELECT
CASE WHEN LAG(F.event_entry_name) OVER (PARTITION BY A.emp_reader_number ORDER BY A.DT) = F.event_entry_name THEN 1 ELSE 0 END AS is_rogue,
*
FROM
trnevents AS A
LEFT JOIN
EVent_entry AS F
ON F.event_entry_id = A.event_id
),
sorted AS
(
SELECT
ROW_NUMBER() OVER ( ORDER BY DT) AS event_sequence_id,
ROW_NUMBER() OVER (PARTITION BY emp_reader_number, is_rogue ORDER BY DT) AS employee_checked_event_sequence_id,
*
FROM
rogue_check
)
SELECT
MIN(event_sequence_id) AS unique_id,
emp_reader_number,
MAX(CASE WHEN event_entry_name = 'IN' THEN DT END) AS time_in,
MAX(CASE WHEN event_entry_name = 'OUT' THEN DT END) AS time_out
FROM
sorted
GROUP BY
emp_reader_number,
is_rogue,
employee_checked_event_sequence_id - CASE WHEN is_rogue = 1 OR event_entry_name = 'IN' THEN 0 ELSE 1 END
ORDER BY
emp_reader_number,
unique_id
;
Example Schema:
CREATE TABLE trnevents (
emp_reader_number INT,
DT DATETIME,
event_id INT
);
CREATE TABLE Event_entry (
event_entry_id INT,
event_entry_name NVARCHAR(32)
);
Example Data:
INSERT INTO Event_entry VALUES (0, N'IN'), (1, N'OUT');
INSERT INTO trnevents VALUES
(1, '2017-01-01 08:00', 0),
(1, '2017-01-01 08:01', 0),
(1, '2017-01-01 12:00', 1),
(1, '2017-01-01 13:00', 0),
(1, '2017-01-01 17:00', 1),
(1, '2017-01-01 17:01', 1)
;
Example Results:
unique_id emp_reader_number time_in time_out
1 1 01/01/2017 08:00:00 01/01/2017 12:00:00
2 1 01/01/2017 08:01:00 null
4 1 01/01/2017 13:00:00 01/01/2017 17:00:00
6 1 null 01/01/2017 17:01:00
The GROUP BY turned out a bit more fiddly than I anticipated on the train and so may cause an expensive SORT in the execution plan for large data sets. I'll also think about an alternative shortly.
Here is a demo with some simple dummy data demonstrating that it works for those cases at least. (Feel free to update it with other cases if they demonstrate any problems)
http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=d06680d8ed374666760cdc67182aaacb
You can use a PIVOT
select id, [in], out
from
( select
id, time_stamp, accessType,
(ROW_NUMBER() over (partition by id order by time_stamp) -1 )/ 2 rn
from yourtable ) src
pivot
(min(time_stamp) for accessType in ([in],[out])) p
This assumes that each "in" is followed by an "out" and uses row_number to group those pairs of times.
I am using SQL Server 2008 and am trying to increase the speed of my query below. The query assigns points to patients based on readmission dates.
Example: A patient is seen on 1/2, 1/5, 1/7, 1/8, 1/9, 2/4. I want to first group visits within 3 days of each other. 1/2-5 are grouped, 1/7-9 are grouped. 1/5 is NOT grouped with 1/7 because 1/5's actual visit date is 1/2. 1/7 would receive 3 points because it is a readmit from 1/2. 2/4 would also receive 3 points because it is a readmit from 1/7. When the dates are grouped the first date is the actual visit date.
Most articles suggest limiting the data set or adding indexes to increase speed. I have limited the amount of rows to about 15,000 and added a index. When running the query with 45 test visit dates/ 3 test patients, the query takes 1.5 min to run. With my actual data set it takes > 8 hrs.
How can I get this query to run < 1 hr? Is there a better way to write my query? Does my Index look correct? Any help would be greatly appreciated.
Example expected results below query.
;CREATE TABLE RiskReadmits(MRN INT, VisitDate DATE, Category VARCHAR(15))
;CREATE CLUSTERED INDEX Risk_Readmits_Index ON RiskReadmits(VisitDate)
;INSERT RiskReadmits(MRN,VisitDate,CATEGORY)
VALUES
(1, '1/2/2016','Inpatient'),
(1, '1/5/2016','Inpatient'),
(1, '1/7/2016','Inpatient'),
(1, '1/8/2016','Inpatient'),
(1, '1/9/2016','Inpatient'),
(1, '2/4/2016','Inpatient'),
(1, '6/2/2016','Inpatient'),
(1, '6/3/2016','Inpatient'),
(1, '6/5/2016','Inpatient'),
(1, '6/6/2016','Inpatient'),
(1, '6/8/2016','Inpatient'),
(1, '7/1/2016','Inpatient'),
(1, '8/1/2016','Inpatient'),
(1, '8/4/2016','Inpatient'),
(1, '8/15/2016','Inpatient'),
(1, '8/18/2016','Inpatient'),
(1, '8/28/2016','Inpatient'),
(1, '10/12/2016','Inpatient'),
(1, '10/15/2016','Inpatient'),
(1, '11/17/2016','Inpatient'),
(1, '12/20/2016','Inpatient')
;WITH a AS (
SELECT
z1.VisitDate
, z1.MRN
, (SELECT MIN(VisitDate) FROM RiskReadmits WHERE VisitDate > DATEADD(day, 3, z1.VisitDate)) AS NextDay
FROM
RiskReadmits z1
WHERE
CATEGORY = 'Inpatient'
), a1 AS (
SELECT
MRN
, MIN(VisitDate) AS VisitDate
, MIN(NextDay) AS NextDay
FROM
a
GROUP BY
MRN
), b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
a1
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
a
JOIN b
ON a.VisitDate = b.NextDay
), c AS (
SELECT
MRN,
VisitDate
, (SELECT MAX(VisitDate) FROM b WHERE b1.VisitDate > VisitDate AND b.MRN = b1.MRN) AS PreviousVisitDate
FROM
b b1
)
SELECT distinct
c1.MRN,
c1.VisitDate
, CASE
WHEN DATEDIFF(day,c1.PreviousVisitDate,c1.VisitDate) < 30 THEN PreviousVisitDate
ELSE NULL
END AS ReAdmissionFrom
, CASE
WHEN DATEDIFF(day,c1.PreviousVisitDate,c1.VisitDate) < 30 THEN 3
ELSE 0
END AS Points
FROM
c c1
ORDER BY c1.MRN
Expected Results:
MRN VisitDate ReAdmissionFrom Points
1 2016-01-02 NULL 0
1 2016-01-07 2016-01-02 3
1 2016-02-04 2016-01-07 3
1 2016-06-02 NULL 0
1 2016-06-06 2016-06-02 3
1 2016-07-01 2016-06-06 3
1 2016-08-01 NULL 0
1 2016-08-15 2016-08-01 3
1 2016-08-28 2016-08-15 3
1 2016-10-12 NULL 0
1 2016-11-17 NULL 0
1 2016-12-20 NULL 0
oops I changed the names of a few cte's (and the post messed up what was code)
It should be like this:
b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
a1
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
a AS a
JOIN b
ON a.VisitDate = b.NextDay AND a.MRN = b.MRN
)
I'm going to take a wild guess here and say you want to change the b cte to
have AND a.MRN = b.MRN as a second condition in the second select query like this:
, b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
firstVisitAndFollowUp
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
visitsDistance3daysOrMore AS a
JOIN b
ON a.VisitDate = b.NextDay AND a.MRN = b.MRN
)