I have a table patient_details that has id, diagnosis_date and diagnosis_code. A unique ID can have multiple entries meaning they were diagnosed with different diseases at different times.
GOAL: I want to see patients that eventually progress to having disease code 5.10. So I want to see patients who were first diagnosed with code 5 and then progress to diagnosis 5.10. I am not sure how to isolate the dates for each unique patient and see who went from an initial diagnosis of 5 to eventually 5.10. I ultimately just need the count of patients who go from diagnosis code 5 to 5.10
Example of table:
ID |diagnosis_date|diagnosis_code
PT2073|2015-02-28 |5
PT2073|2019-02-28 |5.10
PT2013|2015-04-28 |1
PT2013|2017-02-11 |5
PT2013|2017-07-11 |5.10
This might do the trick:
select id
from patient_details
group by id
having
min(case when diagnosis_code = 5 then diagnosis_date end)
< max(case when diagnosis_code = 5.1 then diagnosis_date end)
This will ensure that:
the patient has at least one record with diagnosis_code = 5 and another with diagnosis_code = 10
the date they were first diagnosed with code 5 is less than the date they were last diagnosed 5.1
Demo on DB Fiddle
Sample data:
id | diagnosis_date | diagnosis_code
:----- | :------------- | -------------:
PT2073 | 2015-02-28 | 4.00
PT2073 | 2019-02-28 | 5.10
PT2013 | 2015-04-28 | 1.00
PT2013 | 2017-02-11 | 5.00
PT2013 | 2017-07-11 | 5.10
Results:
| id |
| :----- |
| PT2013 |
Related
I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.
I have prescription drug data that has a prescription date and the number of days supplied for that prescription. I am trying estimate actually drug intake dates which can be different then prescription date if people (1) refill their prescription before their current prescription is done or (2) they lost their current prescription and so need a refill.
Below is sample data for 1 patient:
| patient_id | rx_start_date | days_supply |
|------------|---------------|-------------|
| 1 | 1/10/2013 | 3 |
| 1 | 1/11/2013 | 3 |
| 1 | 1/14/2013 | 3 |
Without adjusting for stockpiling the end dates are calculated as rx_start_date + days_supply - 1 see:
| patient_id | rx_start_date | days_supply | rx_end_date |
|------------|---------------|-------------|-------------|
| 1 | 1/10/2013 | 3 | 1/12/2013 |
| 1 | 1/11/2013 | 3 | 1/13/2013 |
| 1 | 1/14/2013 | 3 | 1/16/2013 |
As you can see the start date for the 2nd prescription is overlapped by the first prescription. If we assume that they filled their prescription early then the actual intake date for the 2nd prescription should start on 1/13/2013. But moving the end date of the 2nd prescription causes an overlap over the 3rd prescription and so that must be moved as well. See the expected resulting table below:
| patient_id | rx_start_date | days_supply | rx_end_date |
|------------|---------------|-------------|-------------|
| 1 | 1/10/2013 | 3 | 1/12/2013 |
| 1 | 1/13/2013 | 3 | 1/15/2013 |
| 1 | 1/16/2013 | 3 | 1/18/2013 |
The other case is we might say if the current prescription overlaps the next one by more than 50% than we assume they lost their prescription and the 2nd prescription start date is the actual intake date. This means though that we need to truncate the current prescription to end when the 2nd one starts.
The algorithm is relatively simple using a non-sql iterative solution but I'm having trouble with a generic sql solution since adjusting dates at time X could potentially cause a cascading effect that adjust many other dates. I'm using Impala SQL so recursive CTE's are not an option and I'd like this to work on other databases so database specific functions are not ideal either.
The following should give you what you are looking for, so long as there are no gaps in the treatment regime:
with aggs as (select d1.patient_id, d1.rx_start_dt, sum(ds.days_supply) days_supply, min(ds.rx_start_dt) + sum(ds.days_supply) - 1 end_dt
from drugs d1
inner join drugs ds
on ds.patient_id = d1.patient_id and ds.rx_start_dt <= d1.rx_start_dt
group by d1.patient_id, d1.rx_start_dt)
select patient_id, coalesce(lag(end_dt+1) over (partition by patient_id order by rx_start_dt),rx_start_dt) start_dt, end_dt
from aggs;
Using the given sample data, this gives as output:
ID Start End
1 2013-01-10 2013-01-12
1 2013-01-13 2013-01-15
1 2013-01-16 2013-01-18
This was tested on Oracle, but all functions used appear to also be available in impala so should work there too.
I have a table called transactions that has the ledger from a storefront. Let's say it looks like this, for simplicity:
trans_id | cust | date | num_items | cost
---------+------+------+-----------+------
1 | Joe | 4/18 | 6 | 14.83
2 | Sue | 4/19 | 3 | 8.30
3 | Ann | 4/19 | 1 | 2.28
4 | Joe | 4/19 | 4 | 17.32
5 | Sue | 4/19 | 3 | 8.30
6 | Lee | 4/19 | 2 | 9.55
7 | Ann | 4/20 | 1 | 2.28
For the credit card purchases, I subsequently get an electronic ledger that has the full timestamp. So I have a table called cctrans with date, time, cust, cost, and some other info. I want to add a column trans_id to the cctrans table, that references the transactions table.
The update statement for this is simple enough, except for one hitch: I have an 11 AM transaction from Sue on 4/19 for $8.30 and a 3 PM transaction from Sue on 4/19 for $8.30 that are the same in the transactions table except for the trans_id field. I don't really care which record of the cctrans table gets linked to trans_id 2 and which one gets linked to trans_id 5, but they can't both be assigned the same trans_id.
The question here is: How do I accomplish that (ideally in a way that also works when a customer makes the same purchase three or four times in a day)?
The best I have so far is to do:
UPDATE cctrans AS cc
SET trans_id = t.trans_id
WHERE cc.cust = t.cust AND cc.date = t.date AND cc.cost = t.cost
And then fix them one-by-one via manual inspection. But obviously that's not my preferred solution.
Thanks for any help you can provide.
I have a query that's trying to sum up a patient's length of stay at a hospital. Here is an example of the data
| Patient | Admission_ID | Admission_Event_ID | Admission_Event_Type | Start Date | End Date | Duration | Linked_Admission |
| P0001 | ADM0001 | AE1 | (formal) Separation | 2012-12-18 | 2012-12-18 | 0 | ADM0002 |
| P0001 | ADM0001 | AE2 | Statistical Admission | 2012-12-17 | 2012-12-18 | 1 | ADM0002 |
| P0001 | ADM0002 | AE3 | Statistical Separation| 2012-12-17 | 2012-12-17 | 0 | NULL |
| P0001 | ADM0002 | AE4 | (formal) Admission | 2012-11-30 | 2012-12-17 | 17 | NULL |
| P0002 | ADM0003 | AE5 | (formal) Admission | 2012-11-30 | 2012-12-25 | 25 | NULL |
. . .
EDIT: Forgot to mention, there is a column that links the admission ID (only used when the patient is statistically separated and admitted)
By definition, the length of stay is calculated for each patient from the start of their admission until they are separation (statistical separations and admission carry on with the admission, but they're given a new Admission ID
A report is run to find out the average length of stay (ALOS) for the hospital and it's unit, the user selects two dates to report between. I've used a CTE (lets call it CTESep) to get all the patient's that have been formally separated between the reporting period. I then use another CTE (called CTEAdmissions) to get all the admissions of the patients within CTESep. This is where I get stuck.
I need to sum up the Durations of the patient to get their total length of stay for that admission (which is a combination of ADM0001 and ADM0002) so the total LOS will be 18, rather than 17 and 1.
My idea was to
ORDER BY Patient
, End_Date DESC
, adm_id
, CASE WHEN
Admission_Event_Type = '(formal) Separation ' THEN 1
WHEN Admission_Event_Type = 'Statistical Admission ' THEN 2
WHEN Admission_Event_Type = 'Statistical Separation' THEN 3
WHEN Admission_Event_Type = '(formal) Admission ' THEN 4
END ASC
Then sum up the duration on based on a condition. The condition rule is 'Start summing up the duration of each patient's admission from a formal separation to a formal admission'. Which I'm not sure how to do.
I've tried:
SELECT SUM(Duration) OVER(PARTITION BY Patient) AS 'Sum'
But that will give me the total LOS for the patient across ALL their admissions (if they have more than one separation within that reporting period)
I've also tried
SELECT SUM(Duration) OVER(PARTITION BY Patient, Admission_ID) AS 'Sum'
But of course that gives me the LOS of a patient between a formal admission and a statistical separation (and not the LOS by its actual definition).
Anyone got a different way of tackling this problem? By the way, using Sybase
How about this:
select patientid,
admissionid,
datediff(day,
max(case when Admission_Event_Type = '(formal) Separation ' then startdate end),
max(case when Admission_Event_Type = '(formal) Admission ' then enddate end)
) as total_length
from data
group by patientid, admissionid
I've been struggling away at this and I think I'm in need of a hint!
I have a delivery_table for eBooks and in this table are the following columns: catalogue_number, delivery_date, status_id
I'm trying to bring back all entries from column catalogue_number where the product_status is '4' but only when there is a more recent delivery of the catalogue_number that is at status '1'
Here's an example of what I might see for a particular eBook:
+---------------+------------------+-----------+
| Delivery_Date | Catalogue_Number | Status_Id |
+---------------+------------------+-----------+
| 12/02/2012 | ABA00001 | 3 |
+---------------+------------------+-----------+
| 01/02/2012 | ABA00001 | 1 |
+---------------+------------------+-----------+
| 20/01/2012 | ABA00001 | 4 |
+---------------+------------------+-----------+
| 18/01/2012 | ABA00001 | 4 |
+---------------+------------------+-----------+
| 10/01/2012 | ABA00001 | 3 |
+---------------+------------------+-----------+
| 01/01/2012 | ABA00001 | 3 |
+---------------+------------------+-----------+
The second from top delivery is at status 1 (ingested) but there are earlier deliveries at status '4' (errored) I want to bring back every cat number at status '4' where this is the case.
I'm guessing it's going to be some kind of nested query scenario but I'm struggling to think up the logic of what the query needs to do!
Thanks in advance for your help!
I'm using SQL Microsoft Server 2008
You could use a CTE (Common Table Expression) - this will define the subset of orders that have a 1 status and you can then join to your initial query e.g.
with Order1Statuses as (
select Catalogue_number,
max(Delivery_date) as status1date,
from delivery_table
where status_id = 1
group by catalogue_number
)
select t4.Catalogue_number
from delivery_table t4 inner join Order1Statuses t1 on
t4.catalogue_number = t1.catalogue_number
and t4.delivery_date < t1.status1date
and t4.status_id = 4