Summing up only the values of previous rows with the same ID - sql

As I am preparing my data for predicting no-shows at a hospital, I ran into the following problem: In the query below I tried to get the number of shows/no-shows relatively shown to the number of appointments (APPTS). INDICATION_NO_SHOW means whether a patient showed up at a appointment. 0 means show, and 1 means no-show.
with t1 as
(
select
PAT_ID
,APPT_TIME
,APPT_ID
,ROW_NUMBER () over(PARTITION BY PAT_ID order by pat_id,APPT_TIME) as [TOTAL_APPTS]
,INDICATION_NO_SHOW
from appointments
)
,
t2 as
(
t1.PAT_ID
,t1.APPT_TIME
,INDICATION_NO_SHOW
,sum(INDICATION_NO_SHOW) over(order by PAT_ID, APPT_TIME ) as TOTAL_NO_SHOWS
,TOTAL_APPT
from t1
)
SELECT *
,(TOTAL_APPT- TOTAL_NO_SHOWS) AS TOTAL_SHOWS
FROM T2
order by PAT_ID, APPT_TIME
This resulted into the following dataset:
PAT ID APPT_TIME INDICATION_NO_SHOW TOTAL_SHOWS TOTAL_NO_SHOWS TOTAL_APPTS
1 1-1-2001 0 1 0 1
1 1-2-2001 0 2 0 2
1 1-3-2001 1 2 1 3
1 1-4-2001 0 3 1 4
2 1-1-2001 0 0 1 1
2 2-1-2001 0 1 1 2
2 2-2-2001 1 1 2 3
2 2-3-2001 0 2 2 4
As you can see my query only worked for patient 1, and then it also counts the no-shows for patient 1 for patient 2. So individually it worked for 1 patient, but not over the whole dataset.
The TOTAL_APPTs column worked out, because it counted the number of appts the patient had at the moment of that given appt. My question is: How do I succesfully get these shows and no-shows succesfully added up (as I did for patient 1)? I'm completely aware why this query doesn't work, I'm just completely in the blue on how to fix it..

I think that you can just use window functions. You seem to be looking for window sums of shows and no shows per patient, so:
select
pat_id,
appt_time,
indication_no_show,
sum(1 - indication_no_show)
over(partition by pat_id order by appt_time) total_shows,
sum(indication_no_show)
over(partition by pat_id order by appt_time) total_no_shows
from appointments

Related

Rails 5 - I need to return the first record of a group, but there are records that have no group

So as the title suggests I need to return the records from a table, where these records can belong to a group.
If there are several records in a group, return only the last one, and if the record does not belong to any group, return it together.
I have the following tables
(automation_execution) 1 --> n (automation_execution_action) 1 <---> 1 (workable)
I need to return workable table records, where they may or may not be linked to automation tables.
automation_execution
id
company_id
1
1
2
1
automation_execution_ations
id
automation_execution_id
workable_id
1
1
1
2
1
2
workable
id
company_id
status
created_at
1
1
finished
2022-01-19 19:48:24
2
1
processing
2022-01-19 18:00:24
3
1
processing
2022-01-19 18:00:24
4
1
processing
2022-01-19 18:00:24
In the example above, we have 4 workables, 1 and 2 belong to an automation and 3 and 4 do not, in this example I would need to return the record 2, 3 and 4.
So this SQL works:
select workables.*
from (
select workables.*,
automation_execution_actions.automation_execution_id,
row_number()
over (partition by automation_execution_actions.automation_execution_id order by workables.id desc) as rn
from workables
left join automation_execution_actions on automation_execution_actions.workable_id = workables.id
) as workables
where rn = 1
OR automation_execution_id IS NULL
order by id;

Resetting a Count in SQL

I have data that looks like this:
ID num_of_days
1 0
2 0
2 8
2 9
2 10
2 15
3 10
3 20
I want to add another column that increments in value only if the num_of_days column is divisible by 5 or the ID number increases so my end result would look like this:
ID num_of_days row_num
1 0 1
2 0 2
2 8 2
2 9 2
2 10 3
2 15 4
3 10 5
3 20 6
Any suggestions?
Edit #1:
num_of_days represents the number of days since the customer last saw a doctor between 1 visit and the next.
A customer can see a doctor 1 time or they can see a doctor multiple times.
If it's the first time visiting, the num_of_days = 0.
SQL tables represent unordered sets. Based on your question, I'll assume that the combination of id/num_of_days provides the ordering.
You can use a cumulative sum . . . with lag():
select t.*,
sum(case when prev_id = id and num_of_days % 5 <> 0
then 0 else 1
end) over (order by id, num_of_days)
from (select t.*,
lag(id) over (order by id, num_of_days) as prev_id
from t
) t;
Here is a db<>fiddle.
If you have a different ordering column, then just use that in the order by clauses.

A way to only select rows that indicate progression, and ignore rows that indicate recovery?

I have a dataset with thousands of patients that include their ID and their disease stage over time. The data is complicated because there are patients that get worse, then recover, then get worse again. I would like to only select rows from a patient that indicate disease progression.
For example, ID 1 progresses from 3 > 4, then recovers back to stage 1 before worsening again to stage 5. How can I ignore rows that indicate recovery, and only keep rows that indicate progression over time? Is this even possible using SQL? Thank you in advance!
What data looks like:
ID stage_date disease_stage
1 1-JAN-15 3
1 3-JAN-15 4
1 6-JAN-15 1
1 9-JAN-15 5
1 10-JAN-15 1
What I want:
ID stage_date disease_stage
1 1-JAN-15 3
1 3-JAN-15 4
1 9-JAN-15 5
If I understand correctly, you want the rows that match the cumulative maximum:
select t.*
from (select t.*,
max(disease_stage) over (partition by id order by disease_stage) as max_running_disease_stage
from t
) t
where max_running_disease_stage = disease_stage;
This will keep ties. If you don't want ties:
select t.*
from (select t.*,
max(disease_stage) over (partition by id
order by stage_date
rows between unbounded preceding and 1 preceding
) as max_running_disease_stage
from t
) t
where max_running_disease_stage is null or
disease_stage > max_running_disease_stage;

Retrieve unique rows based on id

I have two tables:
Report
ReportId CreatedDate
1 2018-01-12
2 2018-02-12
3 2018-03-12
ReportSpecialty
SpecialtyId ReportId IsPrimarySpecialty
1 1 1
2 2 1
3 3 1
1 2 0
1 3 0
I am trying to write a query that will retrieve me the last 10 reports that were published. However, I need to get 1 report from each specialty. Assume there are 100 specialties, I can pass in as an argument any number of specialties, 10, 20, 5, 2, etc...
I'm trying to figure out a way where if I send it all specialties, it will get me the last 10 reports posted based on the last date created, but it won't give me 2 articles from same specialty. If I send it 10 specialties, then I will get 1 of each. If I send it 5, then I'll get 2 of each. If I send it 3 then I'll get 4 of 1 and 3 of other two.
I may need to write multiple queries for this, I'm trying to see if there is a way to do this on the SQL side of things? If there isn't, then how would I break down to multiple queries to get the result I want?
What I have tried is this, however I get multiple reports with same specialties:
SELECT TOP 10 r.ReportId, rs.SpecialtyId, r.CreatedDate
FROM Report r
INNER JOIN ReportSpecialty rs ON r.ReportId = rs.ReportId AND rs.IsPrimarySpecialty = 1
GROUP BY rs.SpecialtyId, r.AceReportid, r.CreatedDate
ORDER BY r.CreatedDate DESC
with cte as (
SELECT R.ReportId, R.CreatedDate, RS.SpecialtyId,
ROW_NUMBER() OVER (PARTITION BY RS.SpecialtyId
ORDER BY R.CreatedDate DESC) as rn
FROM Report R
JOIN ReportSpecialty RS
ON R.ReportId = RS.ReportId
AND RS.IsPrimarySpecialty = 1
WHERE RS.SpecialtyId IN ( .... ids ... )
)
SELECT TOP 10 *
FROM cte
ORDER BY rn, CreatedDate DESC
row_number will create a id for each speciality, so if you pass 3 speciality you will get something like this.
rn speciality_id
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3

Return results where first entry is 1 and all subsequent rows are 0

I m working on weird SQL query
Patient_ID Count order_no
1 1 1
2 1 2
2 0 3
2 0 4
3 1 5
3 0 6
where I need to count the patient as above, for every new patient , the count column is 1.
If repeated , the below entry it should be 0
I m confused how should make that work in SQL
In order to make the first entry 1 and all subsuqent entries 0, I believe you need a ranking with partition by the order number. Please checkout the sqlfiddle below to test results.
http://www.sqlfiddle.com/#!3/4e2e2/17/0
SELECT
patient_id
,CASE WHEN r.rank = 1
THEN 1
ELSE 0
END
, order_number
FROM
(
SELECT
order_number
,patient_id
,ROW_NUMBER() OVER (PARTITION BY patient_id ORDER BY order_number)[rank]
FROM
PatientTable
)r