How to capture first row in a grouping and subsequent rows that are each a minimum of 15 days apart? - sql

Assume a given insurance will only pay for the same patient visiting the same doctor once in 15 days. If the patient comes once, twice, or twenty times within those 15 days to the doctor, the doctor will get only one payment. If the patient comes again on Day 16 or Day 18 or Day 29 (or all three!), the doctor will get a second payment. The first visit (or first after the 15 day interval) is always the one that must be billed, along with its complaint.
The SQL for all visits can be loosely expressed as follows:
SELECT VisitID
,PatientID
,VisitDtm
,DoctorID
,ComplaintCode
FROM Visits
The goal is to query the Visits table in a way that would capture only billable incidents.
I have been trying to work through this question which is in essence quite similar to Group rows with that are less than 15 days apart and assign min/max date. However, the reason this won't work for me is that, as the accepted answerer (Salman A) points out, Note that this could group much longer date ranges together e.g. 01-01, 01-11, 01-21, 02-01 and 02-11 will be grouped together although the first and last dates are more than 15 days apart. This presents a problem for me as it is a requirement to always capture the next incident after 15 days have passed from the first incident.
I have spent quite a few hours thinking this through and poring over like problems, and am looking for help in understanding the path to a solution, not necessarily an actual code solution. If it's easier to answer in the context of a code solution, that is fine. Any and all guidance is very much appreciated!

This type of task requres a iterative process so you can keep track of the last billable visit. One approach is a recursive cte.
You would typically enumerate the visits of each patient use row_number(), then traverse the dataset starting from the first visit, while keeping track of the last "billable" visit. Once a visit is met that is more than 15 days latter than the last billable visit, the value resets.
with
data as (
select visitid, patientid, visitdtm, doctorid,
row_number() over(partition by patientid order by visitdtm) rn
from visits
),
cte as (
select d.*, visitdtm as billabledtm from data d where rn = 1
union all
select d.*,
case when d.visitdtm >= dateadd(day, 15, c.billabledtm)
then d.visitdtm
else c.billabledtm
end
from cte c
inner join data d
on d.patientid = c.patientid and d.rn = c.rn + 1
)
select * from cte where visitdtm = billabledtm order by patientid, rn
If a patient may have more than 100 visits, then you need to add option (maxrecursion 0) at the very end of the query.

Here's another approach. Similar to GMB's this adds a row_number to the Visits table in a CTE but it also adds the lead date difference between VisitDtm's. Then it takes cumulative "sum over" of the date difference and divides by 15. When that quotient increases by a full integer, it represents a billable event in the data.
Something like this
;with lead_cte as (
select v.*, row_number() over (partition by PatientId order by VisitDtm) rn,
datediff(d, VisitDtm, lead(VisitDtm) over (partition by PatientId order by VisitDtm)) lead_dt_diff
from Visits v),
cum_sum_cte as (
select lc.*, sum(lead_dt_diff) over (partition by PatientId order by VisitDtm)/15 cum_dt_diff
from lead_cte),
min_billable_cte as (
select PatientId, cum_dt_diff, min(rn) min_rn
from cum_sum_cte
group by PatientId, cum_dt_diff)
select lc.*
from lead_cte lc
join min_billable_cte mbc on lc.PatintId=mbc.PatientId
and lc.rn=mbc.min_rn;

Related

How to conditional SQL select

My table consists of user_id, revenue, publish_month columns.
Right now I use group_by user_id and sum(revenue) to get revenue for all individual users.
Is there a single SQL query I can use to query for user revenue across a time period conditionally? If for a specific user, there is a row for this month, I want to query for this month, last month and the month before. If there is not yet a row for this month, I want to query for last month and the two months before.
Any advice with which approach to take would be helpful. If I should be using cases, if-elses with exists or if this is do-able with a single SQL query?
UPDATE---since I did a bad job of describing the question, I've come to include some example data and expected results
Where current month is not present for user 33
Where current month is present
Assuming publish_month is a DATE datatype, this should get the most recent three months of data per user...
SELECT
user_id, SUM(revenue) as s_revenue
FROM
(
SELECT
user_id, revenue, publish_month,
MAX(publish_month) OVER (PARTITION BY user_id) AS user_latest_publish_month
FROM
yourtableyoudidnotname
)
summarised
WHERE
publish_month >= DATEADD(month, -2, user_latest_publish_month)
GROUP BY
user_id
If you want to limit that to the most recent 3 months out of the last 4 calendar months, just add AND publish_month >= DATEADD(month, -3, DATE_TRUNC(month, GETDATE()))
The ambiguity here is why it is important to include a Minimal Reproducible Example
With input data and require results, we could test our code against your requirements
If you're using strings for the publish_month, you shouldn't be, and should fix that with utmost urgency.
You can use a windowing function to "number" the months. In this way the most recent one will have a value of 1, the prior 2, and the one before 3. Then you can only select the items with a number of 3 or less.
Here is how:
SELECT user_id, revienue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
now you just select the items with RN less than 3 and do your sum
SELECT user_id, SUM(revenue) as s_revenue
FROM (
SELECT user_id, revenue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
) X
WHERE RN <= 3
GROUP BY user_id
You could also do this without a sub query if you use the windowing function for SUM and a range, but I think this is easier to understand.
From the comment -- there could be an issue if you have months from more than one year. To solve this make the biggest number in the order by always the most recent. so instead of
ORDER BY publish_month DESC
you would have
ORDER BY (100*publish_year)+publish_month DESC
This means more recent years will always have a higher number so january of 2023 will be 202301 while december of 2022 will be 202212. Since january is a bigger number it will get a row number of 1 and december will get a row number of 2.

SQL - calculating hours since the earliest date in a partition

I have the following SQL code:
select
survey.ContactId,
survey.CommId,
survey.CommCreatedDate,
survey.CommIdStatus,
br.[Value],
null as HoursPastSinceFirstActiveSurvey,
row_number() over (partition by survey.ContactId order by survey.CommCreatedDate desc) as [row]
from
Survey_Completed survey
inner join
Business_Rules br on br.Name = 'OPT_OUT_TIME'
where
survey.CommIdStatus = 'Active'
Which produces the following result set:
What I need help with is filling out HoursPastSinceFirstActiveSurvey. The logic here should be as follows:
Calculate the total number of hours that has passed since the earliest (by CommCreatedDate) record in the partition for consecutive (by day) records. In order to address the "consecutive" part, I was thinking perhaps it might be possible to add to the partitioning logic to only partition if the days are consecutive. I'm not entirely sure if that's possible though. So for example, look at the last two records. They are grouped as a partition and the dates are consecutive and the earliest date/time on this partition is Nov 11 2020 12:00 AM. So I would want to perform the following in order to populate HoursPastSinceFirstActiveSurvey for these two records:
Today's date minus Nov 11 2020 12:00 AM.
This would be the value for those two records in the partition for HoursPastSinceFirstActiveSurvey. I am not sure where to even start with this!! Thank you all.
I was able to solve for this by the following query. Feedback is entirely WELCOME!
select
Q2.ContactId,
min(Q2.CommCreatedDate) as MinDate,
max(Q2.CommCreatedDate) as MaxDate,
Q2.Consecutive,
datediff(hour, min(Q2.CommCreatedDate), max(Q2.CommCreatedDate)) AS HoursPassed
from
(select
Q1.ContactId,
Q1.CommId,
Q1.CommCreatedDate,
Q1.CommIdStatus,
Q1.[Value],
Q1.Consecutive,
Q1.[row],
Q1.countOfPartition
from
(select
survey.ContactId,
survey.CommId,
survey.CommCreatedDate,
survey.CommIdStatus,
br.[Value],
CAST(dateadd(day,-row_number() over (partition by survey.ContactId order by survey.CommCreatedDate), survey.CommCreatedDate) as Date) as Consecutive,
row_number() over (partition by survey.ContactId order by survey.CommCreatedDate desc) as [row],
count(*) over (partition by survey.ContactId) as countOfPartition
from
Survey_Completed survey
inner join
Business_Rules br on br.Name = 'OPT_OUT_TIME'
where
survey.CommIdStatus = 'Active') Q1
where
Q1.countOfPartition <> 1) Q2
group by
Q2.ContactId, Q2.Consecutive, Q2.[Value]
having
datediff(hour, min(Q2.CommCreatedDate), max(Q2.CommCreatedDate)) > Q2.[Value]

Window functions and calculating averages with tricky data manipulation

I have a SQL Server programming challenge involving some manipulations of healthcare patient pulse readings.
The goal is to do an average of readings within a certain time period and to only include the latest pulse reading of the day.
As an example, times are appt_time:
PATIENT 1 PATIENT 2
‘1/1/2019 80 ‘1/3/2019 90
‘1/4/2019 85
‘1/2/2019 10 am 78
‘1/2/2019 1 pm 85
‘1/3/2019 90
A patient may or may not have a second reading in a day. Only the last 3 latest chronological readings are used for the average. If less than 3 readings are available, an average is computed for 2 readings, or 1 reading is chosen as average.
Can this be done with the SQL window functions? This is a little more efficient than using a subquery.
I have used first_VALUE desc statements successfully to pick the last pulse in a day. I then have tried various row_number commands to exclude the marked off row (first pulse of the day when 2 readings are present). I cannot seem to correctly calculate the average. I have used row_number in select and from clauses.
with CTEBPI3
AS (
SELECT pat_id
,appt_time
,bp_pulse
,first_VALUE (bp_pulse) over(partition by appt_time order by appt_time desc ) fv
,ROW_NUMBER() OVER (PARTITION BY appt_time ORDER BY APPT_time DESC)RN1
,,Round(Sum(bp_pulse) OVER (PARTITION BY Pat_id) / COUNT (appt_time) OVER (PARTITION BY Pat_id), 0) AS adJAVGSYS3
FROM
pat_enc
WHERE appt_time > '07/15/2018'
)
select *,
WHEN rn=1
Average for pat1 should be 85
Average for pat2 should be 87.5
You can do this with two window functions:
MAX(appt_time) OVER ... to get the latest time per day
DENSE_RANK() OVER ... to get the last three days
You get the date part from your datetime with CONVERT(DATE, appt_time). The average function AVGis already built in :-)
The complete query:
select pat_id, avg(bp_pulse) as average_pulse
from
(
select
pat_id, appt_time, bp_pulse,
max(appt_time) over (partition by pat_id, convert(date, appt_time)) as max_time,
dense_rank() over (partition by pat_id order by convert(date, appt_time) desc) as rn
from pat_enc
) evaluated
where appt_time = max_time -- last row per day
and rn <= 3 -- last three days
group by pat_id
order by pat_id;
If the column bp_pulse is defined as an integer, you must convert it to a decimal to avoid integer arithmetic:
select pat_id, avg(convert(decimal, bp_pulse)) as average_pulse
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=3df744fcf2af89cdfd8b3cd8b6546d89
Actually, window functions are not necessarily more efficient. It is worth comparing:
select p.pat_id, avg(p.bp_pulse)
from pat_enc p
where -- appt_time > '2018-07-15' and -- don't know if this is necessary
p.appt_time >= (select distinct convert(date, appt_time)
from pat_enc p2
where p2.pat_id = p.pat_id
order by distinct convert(date, appt_time)
offset 2 row fetch first 1 row only
) and
p.appt_time = (select max(p2.appt_time)
from pat_enc p2
where p2.pat_id = p.pat_id and
convert(date, p2.appt_time) = convert(date, p.appt_time)
);
This wants an index on pat_enc(pat_id, appt_time).
In fact, there are a variety of ways to write this logic, with different mixes of subqueries and window functions (this is one extreme).
Which performs the best will depend on the nature of your data. In particular:
The number of appointments on the same day -- is this normally 1 or a large number?
The overall number of days with appointments -- is this right around three or are there hundreds?
You need to test on your data, but I think window function will work best when relatively few rows are filtered out (~1 appointment/day, ~3 days with appointments). Subqueries will be helpful when more rows are being filtered.

Average Group size per month Over previous ten years

I need to find the average size (average number of employees) of all the groups (employers) that we do business with per month for the last ten years.
So I have no problem getting the average group size for each month. For the Current month I can use the following:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
group by ER.EmployerName
This will give me a list of how many employees are in each group. I can then copy and paste the column into excel get the average for the current month.
For the previous month, I want exclude any employees that were added after that month. I have a query for this too:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
group by ER.EmployerName
That will exclude all employees that were added this month. I can continue to this all the way back ten years, but I know there is a better way to do this. I have no problem running this query 120 times, copying and pasting the results into excel to compute the average. However, I'd rather learn a more efficient way to do this.
Another Question, I can't do the following, anyone know a way around it:
Select avg(count(*))
Thanks in advance guys!!
Edit: Employees that have been terminated can be found like this. NULL are employees that are currently employed.
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
join Gen_Info gne on gne.id = EE.newuserid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
and (gne.TerminationDate is NULL OR gen.TerminationDate < DATEADD(day, -14,GETDATE())
group by ER.EmployerName
Are you after a query that shows the count by year and month they were added? if so this seems pretty straight forward.
this is using mySQL date functions Year & month.
Select AVG(cnt) FROM (
Select count(*) cnt, Year(dateAdded), Month(dateAdded)
from System_Users su
join system_Employers se on se.employerid = su.employerid
group by Year(dateAdded), Month(dateAdded)) B
The inner query counts and breaks out the counts by year and month We then wrap that in a query to show the avg.
--2nd attempt but I'm Brain FriDay'd out.
This uses a Common table Expression (CTE) to generate a set of data for the count by Year, Month of the employees, and then averages out by month.
if this isn't what your after, sample data w/ expected results would help better frame the question and I can making assumptions about what you need/want.
With CTE AS (
Select Year(dateAdded) YR , Month(DateAdded) MO, count(*) over (partition by Year(dateAdded), Month(dateAdded) order by DateAdded Asc) as RunningTotal
from System_Users su
join system_Employers se on se.employerid = su.employerid
Order by YR ASC, MO ASC)
Select avg(RunningTotal), mo from cte;

Top 10 based on last month showing 6 previous months

I want to show a graph with income from different parties over the last 6 months, but based on the top income of 10 people only based on the last month.
So this can change each month as the top 10 people can change when they deposit more money, so the graph will show these 10 people's deposits of the last 6 months, based on the last month deposit only.
I already used a LAG function and a RANK() OVER PARTITION function.
I don't understand why you'll need rank or lag functions.
You can simply use an IN statement:
SELECT * FROM YourTable t
WHERE t.depositDate between StartRangeDate and EndRangeDate
AND t.ID in(select ID from(SELECT s.id,sum(s.depositAmount) as total
from YourTable s
where s.date between ThisMonthStart and ThisMonthEnd
group by s.id)
order by total
limit 10)
You can play with the first select to select what ever you want/add a group by and sum them or I don't know.