Top 10 based on last month showing 6 previous months - sql

I want to show a graph with income from different parties over the last 6 months, but based on the top income of 10 people only based on the last month.
So this can change each month as the top 10 people can change when they deposit more money, so the graph will show these 10 people's deposits of the last 6 months, based on the last month deposit only.
I already used a LAG function and a RANK() OVER PARTITION function.

I don't understand why you'll need rank or lag functions.
You can simply use an IN statement:
SELECT * FROM YourTable t
WHERE t.depositDate between StartRangeDate and EndRangeDate
AND t.ID in(select ID from(SELECT s.id,sum(s.depositAmount) as total
from YourTable s
where s.date between ThisMonthStart and ThisMonthEnd
group by s.id)
order by total
limit 10)
You can play with the first select to select what ever you want/add a group by and sum them or I don't know.

Related

How to conditional SQL select

My table consists of user_id, revenue, publish_month columns.
Right now I use group_by user_id and sum(revenue) to get revenue for all individual users.
Is there a single SQL query I can use to query for user revenue across a time period conditionally? If for a specific user, there is a row for this month, I want to query for this month, last month and the month before. If there is not yet a row for this month, I want to query for last month and the two months before.
Any advice with which approach to take would be helpful. If I should be using cases, if-elses with exists or if this is do-able with a single SQL query?
UPDATE---since I did a bad job of describing the question, I've come to include some example data and expected results
Where current month is not present for user 33
Where current month is present
Assuming publish_month is a DATE datatype, this should get the most recent three months of data per user...
SELECT
user_id, SUM(revenue) as s_revenue
FROM
(
SELECT
user_id, revenue, publish_month,
MAX(publish_month) OVER (PARTITION BY user_id) AS user_latest_publish_month
FROM
yourtableyoudidnotname
)
summarised
WHERE
publish_month >= DATEADD(month, -2, user_latest_publish_month)
GROUP BY
user_id
If you want to limit that to the most recent 3 months out of the last 4 calendar months, just add AND publish_month >= DATEADD(month, -3, DATE_TRUNC(month, GETDATE()))
The ambiguity here is why it is important to include a Minimal Reproducible Example
With input data and require results, we could test our code against your requirements
If you're using strings for the publish_month, you shouldn't be, and should fix that with utmost urgency.
You can use a windowing function to "number" the months. In this way the most recent one will have a value of 1, the prior 2, and the one before 3. Then you can only select the items with a number of 3 or less.
Here is how:
SELECT user_id, revienue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
now you just select the items with RN less than 3 and do your sum
SELECT user_id, SUM(revenue) as s_revenue
FROM (
SELECT user_id, revenue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
) X
WHERE RN <= 3
GROUP BY user_id
You could also do this without a sub query if you use the windowing function for SUM and a range, but I think this is easier to understand.
From the comment -- there could be an issue if you have months from more than one year. To solve this make the biggest number in the order by always the most recent. so instead of
ORDER BY publish_month DESC
you would have
ORDER BY (100*publish_year)+publish_month DESC
This means more recent years will always have a higher number so january of 2023 will be 202301 while december of 2022 will be 202212. Since january is a bigger number it will get a row number of 1 and december will get a row number of 2.

Summarizing consecutive values in T-SQL

I have a question regarding T-SQL:
I have a database of my insurance clients, who have a contractual obligation to pay the company insurance fee ever month. An example of the dataset is presented below:
I have the date and client_id and the overdue_flag. The later is a binary one: 0 if given client has no overdue payment and 1 if he/she has. The question: I would like to create a summary of overdue months (see image 2).
If its the first month the client is overdue then it should be 1, if second then 2 and so on. However, if the client comes clean (makes good on overdue payments) it should go back to 0, and if the same client is overdue again, the count of overdue months should restart the count from 1. In other words: I only would like to sum the consecutive overdue months.
Thanks in advance for the help!
Use a cumulative sum to define the groups by the number of non-overdue months before each row. Then row_number() for the overdue periods:
select t.*,
(case when overdue_flag = 1
then row_number() over (partition by client_id, grp, overdue_flag order by date)
end) as months_overdue
from (select t.*,
sum(1 - overdue_flag) over (partition by client_id order by date) as grp
from t
) t

How to capture first row in a grouping and subsequent rows that are each a minimum of 15 days apart?

Assume a given insurance will only pay for the same patient visiting the same doctor once in 15 days. If the patient comes once, twice, or twenty times within those 15 days to the doctor, the doctor will get only one payment. If the patient comes again on Day 16 or Day 18 or Day 29 (or all three!), the doctor will get a second payment. The first visit (or first after the 15 day interval) is always the one that must be billed, along with its complaint.
The SQL for all visits can be loosely expressed as follows:
SELECT VisitID
,PatientID
,VisitDtm
,DoctorID
,ComplaintCode
FROM Visits
The goal is to query the Visits table in a way that would capture only billable incidents.
I have been trying to work through this question which is in essence quite similar to Group rows with that are less than 15 days apart and assign min/max date. However, the reason this won't work for me is that, as the accepted answerer (Salman A) points out, Note that this could group much longer date ranges together e.g. 01-01, 01-11, 01-21, 02-01 and 02-11 will be grouped together although the first and last dates are more than 15 days apart. This presents a problem for me as it is a requirement to always capture the next incident after 15 days have passed from the first incident.
I have spent quite a few hours thinking this through and poring over like problems, and am looking for help in understanding the path to a solution, not necessarily an actual code solution. If it's easier to answer in the context of a code solution, that is fine. Any and all guidance is very much appreciated!
This type of task requres a iterative process so you can keep track of the last billable visit. One approach is a recursive cte.
You would typically enumerate the visits of each patient use row_number(), then traverse the dataset starting from the first visit, while keeping track of the last "billable" visit. Once a visit is met that is more than 15 days latter than the last billable visit, the value resets.
with
data as (
select visitid, patientid, visitdtm, doctorid,
row_number() over(partition by patientid order by visitdtm) rn
from visits
),
cte as (
select d.*, visitdtm as billabledtm from data d where rn = 1
union all
select d.*,
case when d.visitdtm >= dateadd(day, 15, c.billabledtm)
then d.visitdtm
else c.billabledtm
end
from cte c
inner join data d
on d.patientid = c.patientid and d.rn = c.rn + 1
)
select * from cte where visitdtm = billabledtm order by patientid, rn
If a patient may have more than 100 visits, then you need to add option (maxrecursion 0) at the very end of the query.
Here's another approach. Similar to GMB's this adds a row_number to the Visits table in a CTE but it also adds the lead date difference between VisitDtm's. Then it takes cumulative "sum over" of the date difference and divides by 15. When that quotient increases by a full integer, it represents a billable event in the data.
Something like this
;with lead_cte as (
select v.*, row_number() over (partition by PatientId order by VisitDtm) rn,
datediff(d, VisitDtm, lead(VisitDtm) over (partition by PatientId order by VisitDtm)) lead_dt_diff
from Visits v),
cum_sum_cte as (
select lc.*, sum(lead_dt_diff) over (partition by PatientId order by VisitDtm)/15 cum_dt_diff
from lead_cte),
min_billable_cte as (
select PatientId, cum_dt_diff, min(rn) min_rn
from cum_sum_cte
group by PatientId, cum_dt_diff)
select lc.*
from lead_cte lc
join min_billable_cte mbc on lc.PatintId=mbc.PatientId
and lc.rn=mbc.min_rn;

Average Group size per month Over previous ten years

I need to find the average size (average number of employees) of all the groups (employers) that we do business with per month for the last ten years.
So I have no problem getting the average group size for each month. For the Current month I can use the following:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
group by ER.EmployerName
This will give me a list of how many employees are in each group. I can then copy and paste the column into excel get the average for the current month.
For the previous month, I want exclude any employees that were added after that month. I have a query for this too:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
group by ER.EmployerName
That will exclude all employees that were added this month. I can continue to this all the way back ten years, but I know there is a better way to do this. I have no problem running this query 120 times, copying and pasting the results into excel to compute the average. However, I'd rather learn a more efficient way to do this.
Another Question, I can't do the following, anyone know a way around it:
Select avg(count(*))
Thanks in advance guys!!
Edit: Employees that have been terminated can be found like this. NULL are employees that are currently employed.
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
join Gen_Info gne on gne.id = EE.newuserid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
and (gne.TerminationDate is NULL OR gen.TerminationDate < DATEADD(day, -14,GETDATE())
group by ER.EmployerName
Are you after a query that shows the count by year and month they were added? if so this seems pretty straight forward.
this is using mySQL date functions Year & month.
Select AVG(cnt) FROM (
Select count(*) cnt, Year(dateAdded), Month(dateAdded)
from System_Users su
join system_Employers se on se.employerid = su.employerid
group by Year(dateAdded), Month(dateAdded)) B
The inner query counts and breaks out the counts by year and month We then wrap that in a query to show the avg.
--2nd attempt but I'm Brain FriDay'd out.
This uses a Common table Expression (CTE) to generate a set of data for the count by Year, Month of the employees, and then averages out by month.
if this isn't what your after, sample data w/ expected results would help better frame the question and I can making assumptions about what you need/want.
With CTE AS (
Select Year(dateAdded) YR , Month(DateAdded) MO, count(*) over (partition by Year(dateAdded), Month(dateAdded) order by DateAdded Asc) as RunningTotal
from System_Users su
join system_Employers se on se.employerid = su.employerid
Order by YR ASC, MO ASC)
Select avg(RunningTotal), mo from cte;

How to use percentile and rank to give avg days for 90 percent of orders

I have table called order and it has 3 columns I'm interested in: order ID, day order placed, day fulfilled. order ID is unique.
I need to find out in how many days (on average) 90% of the orders placed in January of 2016 took to be paid.
If order 1 was fulfilled in 1 day, order 2 in 2 days, order 3 in 3 days... order 10 in 10 days, then I would need to calculate as such:
number of orders = 10
90% of 10 = 9
the first 9 of those 10 orders that were fulfilled, when arranged in ascending order, took: 1+2+3+4+5+6+7+8+9 = 45 days to fulfill
hence, avg day for first 90% of orders fulfilled is: 45/9 = 5 days.
How can I write a query to first arrange orders by "number of days to fulfill" and then calculate avg days it took for the first 90% of orders for that period?
First, we would have to assume that most of the orders have been filled from January.
Second, you can do this with analytic functions. Although the percentile functions work, I usually do this the old fashioned way . . . by using row_number() and count(*):
select min(days)
from (select (coalesce(datefulfilled, trunc(sysdate)) - dateordered) as days,
sum(count(*) over (order by (coalesce(datefulfilled, trunc(sysdate)) - dateordered)) as cumecnt,
sum(count(*)) over () as totalcnt
from orders o
group by (coalesce(datefulfilled, trunc(sysdate)) - dateordered)
) d
where cumecnt >= 0.9 * cnt ;