Summarizing consecutive values in T-SQL - sql

I have a question regarding T-SQL:
I have a database of my insurance clients, who have a contractual obligation to pay the company insurance fee ever month. An example of the dataset is presented below:
I have the date and client_id and the overdue_flag. The later is a binary one: 0 if given client has no overdue payment and 1 if he/she has. The question: I would like to create a summary of overdue months (see image 2).
If its the first month the client is overdue then it should be 1, if second then 2 and so on. However, if the client comes clean (makes good on overdue payments) it should go back to 0, and if the same client is overdue again, the count of overdue months should restart the count from 1. In other words: I only would like to sum the consecutive overdue months.
Thanks in advance for the help!

Use a cumulative sum to define the groups by the number of non-overdue months before each row. Then row_number() for the overdue periods:
select t.*,
(case when overdue_flag = 1
then row_number() over (partition by client_id, grp, overdue_flag order by date)
end) as months_overdue
from (select t.*,
sum(1 - overdue_flag) over (partition by client_id order by date) as grp
from t
) t

Related

How to conditional SQL select

My table consists of user_id, revenue, publish_month columns.
Right now I use group_by user_id and sum(revenue) to get revenue for all individual users.
Is there a single SQL query I can use to query for user revenue across a time period conditionally? If for a specific user, there is a row for this month, I want to query for this month, last month and the month before. If there is not yet a row for this month, I want to query for last month and the two months before.
Any advice with which approach to take would be helpful. If I should be using cases, if-elses with exists or if this is do-able with a single SQL query?
UPDATE---since I did a bad job of describing the question, I've come to include some example data and expected results
Where current month is not present for user 33
Where current month is present
Assuming publish_month is a DATE datatype, this should get the most recent three months of data per user...
SELECT
user_id, SUM(revenue) as s_revenue
FROM
(
SELECT
user_id, revenue, publish_month,
MAX(publish_month) OVER (PARTITION BY user_id) AS user_latest_publish_month
FROM
yourtableyoudidnotname
)
summarised
WHERE
publish_month >= DATEADD(month, -2, user_latest_publish_month)
GROUP BY
user_id
If you want to limit that to the most recent 3 months out of the last 4 calendar months, just add AND publish_month >= DATEADD(month, -3, DATE_TRUNC(month, GETDATE()))
The ambiguity here is why it is important to include a Minimal Reproducible Example
With input data and require results, we could test our code against your requirements
If you're using strings for the publish_month, you shouldn't be, and should fix that with utmost urgency.
You can use a windowing function to "number" the months. In this way the most recent one will have a value of 1, the prior 2, and the one before 3. Then you can only select the items with a number of 3 or less.
Here is how:
SELECT user_id, revienue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
now you just select the items with RN less than 3 and do your sum
SELECT user_id, SUM(revenue) as s_revenue
FROM (
SELECT user_id, revenue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
) X
WHERE RN <= 3
GROUP BY user_id
You could also do this without a sub query if you use the windowing function for SUM and a range, but I think this is easier to understand.
From the comment -- there could be an issue if you have months from more than one year. To solve this make the biggest number in the order by always the most recent. so instead of
ORDER BY publish_month DESC
you would have
ORDER BY (100*publish_year)+publish_month DESC
This means more recent years will always have a higher number so january of 2023 will be 202301 while december of 2022 will be 202212. Since january is a bigger number it will get a row number of 1 and december will get a row number of 2.

How to spread annual amount and then add by month in SQL

Currently I'm working with a table that looks like this:
Month | Transaction | amount
2021-07-01| Annual Membership Fee| 45
2021-08-01| Annual Membership Fee| 145
2021-09-01| Annual Membership Fee| 2940
2021-10-01| Annual Membership Fee| 1545
the amount on that table is the total monthly amount (ex. I have 100 customers who paid $15 for the annual membership, so my total monthly amount would be $1500).
However what I would like to do (and I have no clue how) is divide the amount by 12 and spread it into the future in order to have a monthly revenue per month. As an example for 2021-09-01 I would get the following:
$2490/12 = $207.5 (dollars per month for the next 12 months)
in 2021-09-01 I would only get $207.5 for that specific month.
On 2021-10-01 I would get $1545/12 = $128.75 plus $207.5 from the previous month (total = $336.25 for 2021-10-01)
And the same operation would repeat onwards. The last period that I would collect my $207.5 from 2021-09-01 would be in 2022-08-01.
I was wondering if someone could give me an idea of how to perform this in a SQL query/CTE?
Assuming all the months you care about exist in your table, I would suggest something like:
SELECT
month,
(SELECT SUM(m2.amount/12) FROM mytable m2 WHERE m2.month BETWEEN ADD_MONTHS(m1.month, -11) AND m1.month) as monthlyamount
FROM mytable m1
GROUP BY month
ORDER BY month
For each month that exists in the table, this sums 1/12th of the current amount plus the previous 11 months (using the add_months function). I think that's what you want.
A few notes/thoughts:
I'm assuming (based on the column name) that all the dates in the month column end on the 1st, so we don't need to worry about matching days or having the group by return multiple rows for the same month.
You might want to round the SUMs I did, since in some cases dividing by 12 might give you more digits after the decimal than you want for money (although, in that case, you might also have to consider remainders).
If you really only have one transaction per month (like in your example), you don't need to do the group by.
If the months you care about don't exist in your table, then this won't work, but you could do the same thing generating a table of months. e.g. If you have an amount on 2020-01-01 but nothing in 2020-02-01, then this won't return a row for 2021-02-01.
CTE = set up dataset
CTE_2 = pro-rate dataset
FINAL SQL = select future_cal_month,sum(pro_rated_amount) from cte_2 group by 1
with cte as (
select '2021-07-01' cal_month,'Annual Membership Fee' transaction ,45 amount
union all select '2021-08-01' cal_month,'Annual Membership Fee' transaction ,145 amount
union all select '2021-09-01' cal_month,'Annual Membership Fee' transaction ,2940 amount
union all select '2021-10-01' cal_month,'Annual Membership Fee' transaction ,1545 amount)
, cte_2 as (
select
dateadd('month', row_number() over (partition by cal_month order by 1), cal_month) future_cal_month
,amount/12 pro_rated_amount
from
cte
,table(generator(rowcount => 12)) v)
select
future_cal_month
, sum(pro_rated_amount)
from
cte_2
group by
future_cal_month

Count number of years since last deduction

I have a table similar to below where the same account has its fiscal years (FY) and deductions for each year broken out in multiple rows. Accounts can range from 1 - 20+ years. How do I group to one unique row that shows the current year and how many years its been since the account had a deduction?
from this:
to this:
Started to utilize the CTE approach as I have in the past, but as before it started to get ugly and I know there has to be a simpler approach...
Assuming the current year is the most recent year, you would use aggregation:
select account, max(fy),
sum(case when fy = max_fy then deductions end) as this_year_deduction,
max(fy) - max(case when deduction < 0 then fy end) as years_since_deduction
from (select t.*, max(fy) over (partition by account) as max_fy
from t
) t
group by account;
Note: I assume the third column is the most recent deduction. The query uses a window function to extract that.
Haven't used the methods below but I think it is close to what is needed. Corrections welcome. (Code not tested)
with nonZeroes as
(
select * from YourTable where deductions <> 0
)
select Account,
FY,
FY - LAST_VALUE(FY) OVER (PARTITION BY Account
ORDER BY Year Desc
RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING) AS years_since_deductions
from nonZeroes

How to capture first row in a grouping and subsequent rows that are each a minimum of 15 days apart?

Assume a given insurance will only pay for the same patient visiting the same doctor once in 15 days. If the patient comes once, twice, or twenty times within those 15 days to the doctor, the doctor will get only one payment. If the patient comes again on Day 16 or Day 18 or Day 29 (or all three!), the doctor will get a second payment. The first visit (or first after the 15 day interval) is always the one that must be billed, along with its complaint.
The SQL for all visits can be loosely expressed as follows:
SELECT VisitID
,PatientID
,VisitDtm
,DoctorID
,ComplaintCode
FROM Visits
The goal is to query the Visits table in a way that would capture only billable incidents.
I have been trying to work through this question which is in essence quite similar to Group rows with that are less than 15 days apart and assign min/max date. However, the reason this won't work for me is that, as the accepted answerer (Salman A) points out, Note that this could group much longer date ranges together e.g. 01-01, 01-11, 01-21, 02-01 and 02-11 will be grouped together although the first and last dates are more than 15 days apart. This presents a problem for me as it is a requirement to always capture the next incident after 15 days have passed from the first incident.
I have spent quite a few hours thinking this through and poring over like problems, and am looking for help in understanding the path to a solution, not necessarily an actual code solution. If it's easier to answer in the context of a code solution, that is fine. Any and all guidance is very much appreciated!
This type of task requres a iterative process so you can keep track of the last billable visit. One approach is a recursive cte.
You would typically enumerate the visits of each patient use row_number(), then traverse the dataset starting from the first visit, while keeping track of the last "billable" visit. Once a visit is met that is more than 15 days latter than the last billable visit, the value resets.
with
data as (
select visitid, patientid, visitdtm, doctorid,
row_number() over(partition by patientid order by visitdtm) rn
from visits
),
cte as (
select d.*, visitdtm as billabledtm from data d where rn = 1
union all
select d.*,
case when d.visitdtm >= dateadd(day, 15, c.billabledtm)
then d.visitdtm
else c.billabledtm
end
from cte c
inner join data d
on d.patientid = c.patientid and d.rn = c.rn + 1
)
select * from cte where visitdtm = billabledtm order by patientid, rn
If a patient may have more than 100 visits, then you need to add option (maxrecursion 0) at the very end of the query.
Here's another approach. Similar to GMB's this adds a row_number to the Visits table in a CTE but it also adds the lead date difference between VisitDtm's. Then it takes cumulative "sum over" of the date difference and divides by 15. When that quotient increases by a full integer, it represents a billable event in the data.
Something like this
;with lead_cte as (
select v.*, row_number() over (partition by PatientId order by VisitDtm) rn,
datediff(d, VisitDtm, lead(VisitDtm) over (partition by PatientId order by VisitDtm)) lead_dt_diff
from Visits v),
cum_sum_cte as (
select lc.*, sum(lead_dt_diff) over (partition by PatientId order by VisitDtm)/15 cum_dt_diff
from lead_cte),
min_billable_cte as (
select PatientId, cum_dt_diff, min(rn) min_rn
from cum_sum_cte
group by PatientId, cum_dt_diff)
select lc.*
from lead_cte lc
join min_billable_cte mbc on lc.PatintId=mbc.PatientId
and lc.rn=mbc.min_rn;

Top 10 based on last month showing 6 previous months

I want to show a graph with income from different parties over the last 6 months, but based on the top income of 10 people only based on the last month.
So this can change each month as the top 10 people can change when they deposit more money, so the graph will show these 10 people's deposits of the last 6 months, based on the last month deposit only.
I already used a LAG function and a RANK() OVER PARTITION function.
I don't understand why you'll need rank or lag functions.
You can simply use an IN statement:
SELECT * FROM YourTable t
WHERE t.depositDate between StartRangeDate and EndRangeDate
AND t.ID in(select ID from(SELECT s.id,sum(s.depositAmount) as total
from YourTable s
where s.date between ThisMonthStart and ThisMonthEnd
group by s.id)
order by total
limit 10)
You can play with the first select to select what ever you want/add a group by and sum them or I don't know.