SQL Conditional sum and grouping - sql

I have a query that's trying to sum up a patient's length of stay at a hospital. Here is an example of the data
| Patient | Admission_ID | Admission_Event_ID | Admission_Event_Type | Start Date | End Date | Duration | Linked_Admission |
| P0001 | ADM0001 | AE1 | (formal) Separation | 2012-12-18 | 2012-12-18 | 0 | ADM0002 |
| P0001 | ADM0001 | AE2 | Statistical Admission | 2012-12-17 | 2012-12-18 | 1 | ADM0002 |
| P0001 | ADM0002 | AE3 | Statistical Separation| 2012-12-17 | 2012-12-17 | 0 | NULL |
| P0001 | ADM0002 | AE4 | (formal) Admission | 2012-11-30 | 2012-12-17 | 17 | NULL |
| P0002 | ADM0003 | AE5 | (formal) Admission | 2012-11-30 | 2012-12-25 | 25 | NULL |
. . .
EDIT: Forgot to mention, there is a column that links the admission ID (only used when the patient is statistically separated and admitted)
By definition, the length of stay is calculated for each patient from the start of their admission until they are separation (statistical separations and admission carry on with the admission, but they're given a new Admission ID
A report is run to find out the average length of stay (ALOS) for the hospital and it's unit, the user selects two dates to report between. I've used a CTE (lets call it CTESep) to get all the patient's that have been formally separated between the reporting period. I then use another CTE (called CTEAdmissions) to get all the admissions of the patients within CTESep. This is where I get stuck.
I need to sum up the Durations of the patient to get their total length of stay for that admission (which is a combination of ADM0001 and ADM0002) so the total LOS will be 18, rather than 17 and 1.
My idea was to
ORDER BY Patient
, End_Date DESC
, adm_id
, CASE WHEN
Admission_Event_Type = '(formal) Separation ' THEN 1
WHEN Admission_Event_Type = 'Statistical Admission ' THEN 2
WHEN Admission_Event_Type = 'Statistical Separation' THEN 3
WHEN Admission_Event_Type = '(formal) Admission ' THEN 4
END ASC
Then sum up the duration on based on a condition. The condition rule is 'Start summing up the duration of each patient's admission from a formal separation to a formal admission'. Which I'm not sure how to do.
I've tried:
SELECT SUM(Duration) OVER(PARTITION BY Patient) AS 'Sum'
But that will give me the total LOS for the patient across ALL their admissions (if they have more than one separation within that reporting period)
I've also tried
SELECT SUM(Duration) OVER(PARTITION BY Patient, Admission_ID) AS 'Sum'
But of course that gives me the LOS of a patient between a formal admission and a statistical separation (and not the LOS by its actual definition).
Anyone got a different way of tackling this problem? By the way, using Sybase

How about this:
select patientid,
admissionid,
datediff(day,
max(case when Admission_Event_Type = '(formal) Separation ' then startdate end),
max(case when Admission_Event_Type = '(formal) Admission ' then enddate end)
) as total_length
from data
group by patientid, admissionid

Related

Finding how many days left per user per year

I have a table that tracks leave days for each user:
ID | Start | End | IDUser
1 | 02-02-2020 | 03-02-2020 | 2
2 | 01-02-2020 | 21-02-2020 | 2
IDUser connects to the Users Table, that has IDUser and Username columns
I have a view / exhibition / query that shows previous mentioned columns data PLUS a column named UsedDays that counts how many leave days were used:
DATEDIFF(DAY, dbo.leavedays.start, dbo.leavedays.[end]) + 1
This is what I have now:
Start | End | IDUser | UsedDays
02-02-2020 | 03-02-2020 | 2 | 1
01-02-2020 | 21-02-2020 | 1 | 20
Each user has a total available number of days per year so I would like to have a column that subtracts from those total possible days of each user, and show how many are left.
Example:
John (IDUser = 2) has 30 days available this year and he already used 1, so there are 29 left
Start | End | IDUser | TotalDaysYear | UsedDays | LeftDays
02-02-2020 | 03-02-2020 | 2 | 30 | 1 | 29
01-02-2020 | 21-02-2020 | 1 | 20 | 20 | 0
I believe I have to create a table for TotalDaysYear, probably with:
ID | Year | TotalDaysYear | IDUser
1 | 2020 | 30 | 2
2 | 2020 | 20 | 1
IDUser connects to the Users Table, that has IDUser and Username columns
But I'm having trouble finding the logic for the relationship and how to find the result that I want, since it depends also on the year (available days may change per yer, per user).
Assuming you are using SQL Server, this should work:
SELECT
ld.start,
ld.[end],
ld.IDUser,
ldy.TotalDaysYear,
SUM(DATEDIFF(DAY, ld.start, ld.[end])+1) OVER (PARTITION BY ld.IDUser, YEAR(ld.start) ORDER BY ld.start) as UsedDays,
ldy.TotalDaysYear - SUM(DATEDIFF(DAY, ld.start, ld.[end])+1) OVER (PARTITION BY ld.IDUser, YEAR(ld.start) ORDER BY ld.start) as LeftDays
FROM leavedays ld
LEFT JOIN leavedaysperyear ldy
ON YEAR(ld.start) = ldy.Year AND ld.IDUser = ldy.IDUser
Basic idea is to have a running total of Used Days per user, per year and then subtract it to total available days for that user, during that same year.
Here's a SQLFiddle
NB. The example provided doesn't handle leave periods across years

Find patients that develop a more severe disease

I have a table patient_details that has id, diagnosis_date and diagnosis_code. A unique ID can have multiple entries meaning they were diagnosed with different diseases at different times.
GOAL: I want to see patients that eventually progress to having disease code 5.10. So I want to see patients who were first diagnosed with code 5 and then progress to diagnosis 5.10. I am not sure how to isolate the dates for each unique patient and see who went from an initial diagnosis of 5 to eventually 5.10. I ultimately just need the count of patients who go from diagnosis code 5 to 5.10
Example of table:
ID |diagnosis_date|diagnosis_code
PT2073|2015-02-28 |5
PT2073|2019-02-28 |5.10
PT2013|2015-04-28 |1
PT2013|2017-02-11 |5
PT2013|2017-07-11 |5.10
This might do the trick:
select id
from patient_details
group by id
having
min(case when diagnosis_code = 5 then diagnosis_date end)
< max(case when diagnosis_code = 5.1 then diagnosis_date end)
This will ensure that:
the patient has at least one record with diagnosis_code = 5 and another with diagnosis_code = 10
the date they were first diagnosed with code 5 is less than the date they were last diagnosed 5.1
Demo on DB Fiddle
Sample data:
id | diagnosis_date | diagnosis_code
:----- | :------------- | -------------:
PT2073 | 2015-02-28 | 4.00
PT2073 | 2019-02-28 | 5.10
PT2013 | 2015-04-28 | 1.00
PT2013 | 2017-02-11 | 5.00
PT2013 | 2017-07-11 | 5.10
Results:
| id |
| :----- |
| PT2013 |

Query target conditions

I need to query a table accounting for multiple change events. The table (seen below) is partitioned by Date where a snapshot of is taken every day of employees. I would like to create a table that shows milestone changes.
Namely I want the final export to show:
First Date they appear (hire date)
Any record when the Type changes
Last Date they appear (termination date)
This ultimately shows the changes in Type along with the hire/termination date.
I'm wondering what a good way to build this is? I can see a query that takes the UNION of the 3 criteria listed above and then sorts by date then employee but am not sure if this is efficient.
Table
+-----------+------+----------+--------+
| Employee | Type | Date | Active |
+-----------+------+----------+--------+
| urdearboy | 1 | 1/1/2019 | 1 | '<---- Want
+-----------+------+----------+--------+
| urdearboy | 1 | 1/2/2019 | 1 |
+-----------+------+----------+--------+
| urdearboy | 4 | 1/3/2019 | 1 | '<---- Want
+-----------+------+----------+--------+
| urdearboy | 4 | 1/4/2019 | 1 |
+-----------+------+----------+--------+
| urdearboy | 4 | 1/5/2019 | 1 |
+-----------+------+----------+--------+
| urdearboy | 4 | 1/6/2019 | 1 |
+-----------+------+----------+--------+
| urdearboy | 4 | 1/7/2019 | 0 | '<---- Want
+-----------+------+----------+--------+
In the above it can be deduced I was:
Hired 1/1/19
Changed Type 1/3/19
Terminated 1/7/19
One method is to use lag():
select t.*
from (select t.*,
lag(date) over (partition by employee, type, active order by prev_date) as prev_date_eta,
lag(date) over (partition by employee order by date) as prev_date
from t
) t
where prev_date_eta is null or
prev_date_eta <> prev_date;
This approach compares the previous date with the same attributes to the overall previous date for the employee. When these are the same, nothing has changed, so the row is filtered out.
The use of partition by is a big convenience when you want to compare multiple columns. The alternative is basically to compare each column individually.

Impala SQL Stockpiling Algorithm

I have prescription drug data that has a prescription date and the number of days supplied for that prescription. I am trying estimate actually drug intake dates which can be different then prescription date if people (1) refill their prescription before their current prescription is done or (2) they lost their current prescription and so need a refill.
Below is sample data for 1 patient:
| patient_id | rx_start_date | days_supply |
|------------|---------------|-------------|
| 1 | 1/10/2013 | 3 |
| 1 | 1/11/2013 | 3 |
| 1 | 1/14/2013 | 3 |
Without adjusting for stockpiling the end dates are calculated as rx_start_date + days_supply - 1 see:
| patient_id | rx_start_date | days_supply | rx_end_date |
|------------|---------------|-------------|-------------|
| 1 | 1/10/2013 | 3 | 1/12/2013 |
| 1 | 1/11/2013 | 3 | 1/13/2013 |
| 1 | 1/14/2013 | 3 | 1/16/2013 |
As you can see the start date for the 2nd prescription is overlapped by the first prescription. If we assume that they filled their prescription early then the actual intake date for the 2nd prescription should start on 1/13/2013. But moving the end date of the 2nd prescription causes an overlap over the 3rd prescription and so that must be moved as well. See the expected resulting table below:
| patient_id | rx_start_date | days_supply | rx_end_date |
|------------|---------------|-------------|-------------|
| 1 | 1/10/2013 | 3 | 1/12/2013 |
| 1 | 1/13/2013 | 3 | 1/15/2013 |
| 1 | 1/16/2013 | 3 | 1/18/2013 |
The other case is we might say if the current prescription overlaps the next one by more than 50% than we assume they lost their prescription and the 2nd prescription start date is the actual intake date. This means though that we need to truncate the current prescription to end when the 2nd one starts.
The algorithm is relatively simple using a non-sql iterative solution but I'm having trouble with a generic sql solution since adjusting dates at time X could potentially cause a cascading effect that adjust many other dates. I'm using Impala SQL so recursive CTE's are not an option and I'd like this to work on other databases so database specific functions are not ideal either.
The following should give you what you are looking for, so long as there are no gaps in the treatment regime:
with aggs as (select d1.patient_id, d1.rx_start_dt, sum(ds.days_supply) days_supply, min(ds.rx_start_dt) + sum(ds.days_supply) - 1 end_dt
from drugs d1
inner join drugs ds
on ds.patient_id = d1.patient_id and ds.rx_start_dt <= d1.rx_start_dt
group by d1.patient_id, d1.rx_start_dt)
select patient_id, coalesce(lag(end_dt+1) over (partition by patient_id order by rx_start_dt),rx_start_dt) start_dt, end_dt
from aggs;
Using the given sample data, this gives as output:
ID Start End
1 2013-01-10 2013-01-12
1 2013-01-13 2013-01-15
1 2013-01-16 2013-01-18
This was tested on Oracle, but all functions used appear to also be available in impala so should work there too.

MDX : Combine two role playing dimension, with multi values

I'm working since few days on a MDX question and I dont see any issues...
Here is the context :
I have a fact table :
+----------+--------+-------------+------------+------------------+
| Line num | Amount | Line Type | Date | DateConfirmation |
+----------+--------+-------------+------------+------------------+
| 1 | 100 | Reservation | 01/01/2016 | 12/01/2016 |
| 2 | 50 | Reservation | 01/01/2016 | Empty |
| 3 | 80 | Reservation | 20/12/2015 | 01/01/2016 |
| 4 | 30 | DirectSales | 01/01/2016 | 01/01/2016 |
+----------+--------+-------------+------------+------------------+
So in SSAS i have designed a cube with
Amount measure
Date dimension
Date Confirmation dimension
Then 2 date dimensions are role playing dimensions
What I need is to combine, when analysis by Date, all the reservations plus the reservations that have been confirmed at the same date of currentMember.
So i've writted this MDX :
CREATE MEMBER CURRENTCUBE.[Sales].[Type].[All].[Confirmed Reservations]
AS NULL ,
VISIBLE = 1;
Scope ( [Sales].[Type].[All].[Confirmed Reservations] );
Scope( MeasureGroupMeasures("Sales") , [Date].[Hierarchy].Members , [Date].[Date].Members
, [Date Confirmation].[Hierarchy].[All] , [Date Confirmation].[Date].[All] );
This = ([Sales].[Type].&[Reservation], StrToMember("[Date Confirmation].[Hierarchy]." + Right(MemberToStr([Date].[Hierarchy].CurrentMember), Len(MemberToStr([Date].[Hierarchy].CurrentMember)) - Len("[Date].[Hierarchy].") ) ), [Date].[Hierarchy].[All] );
End Scope;
End Scope;
The expected result, if I analyse the sales & reservations with the Date dimension at 01/01/2016 is
+------------------------+-----------+
| Reservation | 150 (1+2) |
| DirectSales | 30 (4) |
| Confirmed Reservations | 80 (3) |
+------------------------+-----------+
This works perfectly if I select in Excel only one date. But It produce very bad result when more than one date is select.
All your suggestions will be very helpfull for me !
Many thanks at all :)
Instead of trying to tackle this in MDX I would suggest a simpler approach. If your current fact table query in the DSV is:
Select LineNum, Amount, LineType, Date, DateConfirmation
From YourFact
I would change it to:
Select LineNum, Amount, LineType, Date, DateConfirmation
From YourFact
UNION ALL
Select LineNum, Amount, 'Confirmed Reservations' as LineType, DateConfirmation as Date, DateConfirmation
From YourFact
WHERE DateConfirmation is not null
Then you shouldn't need any MDX.