Rolling total in SQL that Resets to 0 when going above 90 - sql

First time post. Learning SQL over the past 6 months so help is appreciated. I have data structured as below:
DECLARE #tmp4 as TABLE (
AccountNumber int,
Date date,
DateRank int
)
INSERT INTO #tmp4
VALUES (001, '11/13/2018' , 1)
, (002, '12/19/2018', 2)
, (003, '1/23/2019' , 3)
, (004, '2/5/2019' , 4)
, (005, '3/10/2019' , 5)
, (006, '3/20/2019' , 6)
, (007, '4/8/2019' , 7)
, (008, '5/20/2019' , 8)
What I need to do with this data is calculate a rolling total that resets to 0 once a threshold of 90 days is reached. I have used the DateDiff function to calculate the DateDiffs between consecutive dates and have tried multiple things using LAG and other window functions but can't make it reset. The goal is to find "index visits" which can only occur once every 90 days. So my plan is to have a field that reads 0 on the first visit and resets to 0 for the next stay after 90 days is up from the first visit then only pull visits with a value of 0.
One solution I tried was correct for most sets but did not return the right values for the above set (rows 4 and 8 should start over as "index visits").
The results I would expect for this query would be:
Account Date DateRank RollingTotal
001 |'11/13/2018' | 1 | 0
002 |'12/19/2018' | 2 | 35
003 |'1/23/2019' | 3 | 71
004 |'2/5/2019' | 4 | 84
005 |'3/10/2019' | 5 | 0 (not 117)
006 |'3/20/2019' | 6 | 10
007 |'4/8/2019' | 7 | 29
008 |'5/20/2019' | 8 | 71
Thanks for any help.
Here's the code I tried:
DECLARE #tmp2 as TABLE
(EmrNumber varchar(255)
, AdmitDateTime datetime
, DateRank int
, LagDateDiff int
, RunningTotal int
)
INSERT INTO #tmp2
SELECT tmp1.EmrNumber
, tmp1.AdmitDateTime
, tmp1.DateRank
--, LAG(tmp1.AdmitDateTime) OVER(PARTITION BY tmp1.EmrNumber ORDER BY tmp1.DateRank) as NextAdmitDate
, -DATEDIFF(DAY, tmp1.AdmitDateTime, LAG(tmp1.AdmitDateTime) OVER(PARTITION BY tmp1.EmrNumber ORDER BY tmp1.DateRank)) LagDateDiff
, IIF((SELECT SUM(sumt.total)
FROM (
SELECT -DATEDIFF(DAY, tmpsum.AdmitDateTime, LAG(tmpsum.AdmitDateTime) OVER(PARTITION BY tmpsum.EmrNumber ORDER BY tmpsum.DateRank)) total
FROM #tmp tmpsum
WHERE tmp1.EmrNumber = tmpsum.EmrNumber
AND tmpsum.AdmitDateTime <= tmp1.AdmitDateTime
) sumt) IS NULL, 0, (SELECT SUM(sumt.total)
FROM (
SELECT -DATEDIFF(DAY, tmpsum.AdmitDateTime, LAG(tmpsum.AdmitDateTime) OVER(PARTITION BY tmpsum.EmrNumber ORDER BY tmpsum.DateRank)) total
FROM #tmp tmpsum
WHERE tmp1.EmrNumber = tmpsum.EmrNumber
AND tmpsum.AdmitDateTime <= tmp1.AdmitDateTime
) sumt) ) as RunningTotal
FROM #tmp tmp1
SELECT *
, CASE WHEN LagDateDiff >90 THEN 0
WHEN RunningTotal = 0 THEN 0
ELSE LAG(LagDateDiff) OVER(PARTITION BY EmrNumber ORDER BY DateRank) + RunningTotal END AS RollingTotal
FROM #tmp2

You need a recursive query for this, because the running total has to be checked iteratively, row after row:
with cte as (
select
Account,
Date,
DateRank,
0 RollingTotal
from #tmp4
where DateRank = 1
union all
select
t.Account,
t.Date,
t.DateRank,
case when RollingTotal + datediff(day, c.Date, t.Date) > 90
then 0
else RollingTotal + datediff(day, c.Date, t.Date)
end
from cte c
inner join #tmp4 t on t.DateRank = c.DateRank + 1
)
select * from cte
The anchor of the cte selects the first record (as indicated by DateRank. Then, the recursive part processes rows one by one, and resets the running count when it crosses 90.

Related

Get sum of entries over last 6 months (incomplete months)

My data looks something like this
ProductNumber | YearMonth | Number
1 201803 1
1 201804 3
1 201810 6
2 201807 -3
2 201809 5
Now what I want to have is add an additional entry "6MSum" which is the sum of the last 6 months per ProductNumber (not the last 6 entries).
Please be aware the YearMonth data is not complete, for every ProductNumber there are gaps in between so I cant just use the last 6 entries for the sum. The final result should look something like this.
ProductNumber | YearMonth | Number | 6MSum
1 201803 1 1
1 201804 3 4
1 201810 6 9
2 201807 -3 -3
2 201809 5 2
Additionally I don't want to insert the sum to the table but instead use it in a query like:
SELECT [ProductNumber],[YearMonth],[Number],
6MSum = CONVERT(INT,SUM...)
FROM ...
I found a lot off solutions that use a "sum over period" but only for the last X entries and not for the actual conditional statement of "YearMonth within last 6 months".
Any help would be much appreciated!
Its a SQL Database
EDIT/Answer
It seems to be the case that the gaps within the months have to be filled with data, afterwards something like
sum(Number) OVER (PARTITION BY category
ORDER BY year, week
ROWS 6 PRECEDING) AS 6MSum
Should work.
Reference to the solution : https://dba.stackexchange.com/questions/181773/sum-of-previous-n-number-of-columns-based-on-some-category
You could go the OUTER APPLY route. The following produces your required results exactly:
-- prep data
SELECT
ProductNumber , YearMonth , Number
into #t
FROM ( values
(1, 201803 , 1 ),
(1, 201804 , 3 ),
(1, 201810 , 6 ),
(2, 201807 , -3 ),
(2, 201809 , 5 )
) s (ProductNumber , YearMonth , Number)
-- output
SELECT
ProductNumber
,YearMonth
,Number
,[6MSum]
FROM #t t
outer apply (
SELECT
sum(number) as [6MSum]
FROM #t it
where
it.ProductNumber = t.ProductNumber
and it.yearmonth <= t.yearmonth
and t.yearmonth - it.yearmonth between 0 and 6
) tt
drop table #t
Use outer apply and convert yearmonth to a date, something like this:
with t as (
select t.*,
convert(date, convert(varchar(255), yearmonth) + '01')) as ymd
from yourtable t
)
select t.*, t2.sum_6m
from t outer apply
(select sum(t2.number) as sum_6m
from t t2
where t2.productnumber = t.productnumber and
t2.ymd <= t.ymd and
t2.ymd > dateadd(month, -6, ymd)
) t2;
Just to provide one more option. You can use DATEFROMPARTS to build valid dates from the YearMonth value and then search for values within date ranges.
Testable here: https://rextester.com/APJJ99843
SELECT
ProductNumber , YearMonth , Number
INTO #t
FROM ( values
(1, 201803 , 1 ),
(1, 201804 , 3 ),
(1, 201810 , 6 ),
(2, 201807 , -3 ),
(2, 201809 , 5 )
) s (ProductNumber , YearMonth , Number)
SELECT *
,[6MSum] = (SELECT SUM(number) FROM #t WHERE
ProductNumber = t.ProductNumber
AND DATEFROMPARTS(LEFT(YearMonth,4),RIGHT(YearMonth,2),1) --Build a valid start of month date
BETWEEN
DATEADD(MONTH,-6,DATEFROMPARTS(LEFT(t.YearMonth,4),RIGHT(t.YearMonth,2),1)) --Build a valid start of month date 6 months back
AND DATEFROMPARTS(LEFT(t.YearMonth,4),RIGHT(t.YearMonth,2),1)) --Build a valid end of month date
FROM #t t
DROP TABLE #t
So a working query (provided by a colleauge of mine) can look like this
SELECT [YearMonth]
,[Number]
,[ProductNumber]
, (Select Sum(Number) from [...] DPDS_1 where DPDS.ProductNumber =
DPDS_1.ProductNumber and DPDS_1.YearMonth <= DPDS.YearMonth and DPDS_1.YearMonth >=
convert (int, left (convert (varchar, dateadd(mm, -6, DPDS.YearMonth + '01'), 112),
6)))FROM [...] DPDS

SQL: CTE query Speed

I am using SQL Server 2008 and am trying to increase the speed of my query below. The query assigns points to patients based on readmission dates.
Example: A patient is seen on 1/2, 1/5, 1/7, 1/8, 1/9, 2/4. I want to first group visits within 3 days of each other. 1/2-5 are grouped, 1/7-9 are grouped. 1/5 is NOT grouped with 1/7 because 1/5's actual visit date is 1/2. 1/7 would receive 3 points because it is a readmit from 1/2. 2/4 would also receive 3 points because it is a readmit from 1/7. When the dates are grouped the first date is the actual visit date.
Most articles suggest limiting the data set or adding indexes to increase speed. I have limited the amount of rows to about 15,000 and added a index. When running the query with 45 test visit dates/ 3 test patients, the query takes 1.5 min to run. With my actual data set it takes > 8 hrs.
How can I get this query to run < 1 hr? Is there a better way to write my query? Does my Index look correct? Any help would be greatly appreciated.
Example expected results below query.
;CREATE TABLE RiskReadmits(MRN INT, VisitDate DATE, Category VARCHAR(15))
;CREATE CLUSTERED INDEX Risk_Readmits_Index ON RiskReadmits(VisitDate)
;INSERT RiskReadmits(MRN,VisitDate,CATEGORY)
VALUES
(1, '1/2/2016','Inpatient'),
(1, '1/5/2016','Inpatient'),
(1, '1/7/2016','Inpatient'),
(1, '1/8/2016','Inpatient'),
(1, '1/9/2016','Inpatient'),
(1, '2/4/2016','Inpatient'),
(1, '6/2/2016','Inpatient'),
(1, '6/3/2016','Inpatient'),
(1, '6/5/2016','Inpatient'),
(1, '6/6/2016','Inpatient'),
(1, '6/8/2016','Inpatient'),
(1, '7/1/2016','Inpatient'),
(1, '8/1/2016','Inpatient'),
(1, '8/4/2016','Inpatient'),
(1, '8/15/2016','Inpatient'),
(1, '8/18/2016','Inpatient'),
(1, '8/28/2016','Inpatient'),
(1, '10/12/2016','Inpatient'),
(1, '10/15/2016','Inpatient'),
(1, '11/17/2016','Inpatient'),
(1, '12/20/2016','Inpatient')
;WITH a AS (
SELECT
z1.VisitDate
, z1.MRN
, (SELECT MIN(VisitDate) FROM RiskReadmits WHERE VisitDate > DATEADD(day, 3, z1.VisitDate)) AS NextDay
FROM
RiskReadmits z1
WHERE
CATEGORY = 'Inpatient'
), a1 AS (
SELECT
MRN
, MIN(VisitDate) AS VisitDate
, MIN(NextDay) AS NextDay
FROM
a
GROUP BY
MRN
), b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
a1
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
a
JOIN b
ON a.VisitDate = b.NextDay
), c AS (
SELECT
MRN,
VisitDate
, (SELECT MAX(VisitDate) FROM b WHERE b1.VisitDate > VisitDate AND b.MRN = b1.MRN) AS PreviousVisitDate
FROM
b b1
)
SELECT distinct
c1.MRN,
c1.VisitDate
, CASE
WHEN DATEDIFF(day,c1.PreviousVisitDate,c1.VisitDate) < 30 THEN PreviousVisitDate
ELSE NULL
END AS ReAdmissionFrom
, CASE
WHEN DATEDIFF(day,c1.PreviousVisitDate,c1.VisitDate) < 30 THEN 3
ELSE 0
END AS Points
FROM
c c1
ORDER BY c1.MRN
Expected Results:
MRN VisitDate ReAdmissionFrom Points
1 2016-01-02 NULL 0
1 2016-01-07 2016-01-02 3
1 2016-02-04 2016-01-07 3
1 2016-06-02 NULL 0
1 2016-06-06 2016-06-02 3
1 2016-07-01 2016-06-06 3
1 2016-08-01 NULL 0
1 2016-08-15 2016-08-01 3
1 2016-08-28 2016-08-15 3
1 2016-10-12 NULL 0
1 2016-11-17 NULL 0
1 2016-12-20 NULL 0
oops I changed the names of a few cte's (and the post messed up what was code)
It should be like this:
b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
a1
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
a AS a
JOIN b
ON a.VisitDate = b.NextDay AND a.MRN = b.MRN
)
I'm going to take a wild guess here and say you want to change the b cte to
have AND a.MRN = b.MRN as a second condition in the second select query like this:
, b AS (
SELECT
VisitDate
, MRN
, NextDay
, 1 AS OrderRow
FROM
firstVisitAndFollowUp
UNION ALL
SELECT
a.VisitDate
, a.MRN
, a.NextDay
, b.OrderRow +1 AS OrderRow
FROM
visitsDistance3daysOrMore AS a
JOIN b
ON a.VisitDate = b.NextDay AND a.MRN = b.MRN
)

Work out how many days it took from one status to another : SQL

Please feast your eyes on this current structure of our DB.
Our DBA is currently away for the next two weeks, I have very limited SQL knowledge, I like to stay with the UI and middle tier.
What we are trying to figure out is how can we do the following, we need to write a query to calculate the average period (in days) all commissions have taken to transition from ‘Verified’ to ‘Paid’ for a single dealer, currently the status are
Created
Verified
Rejected
Awaiting Payment
Paid
Refunded
I think this query needs to aim directly at the Commission History Table?
I'm not sure how I would go about writing such query due to the fact my knowledge on SQL is limited...
Any help would be great.
Here's a method to achieve what you're after, although it might not be the most efficient. It seems to me that it's more of a one off query you are looking to run, rather than something that you're going to run on a frequent enough to impact database performance.
Test Table Setup:
CREATE TABLE Commission
(
CommissionId INT,
DealerId INT
)
CREATE TABLE CommissionHistory
(
CommissionId INT,
ActionDate DATETIME,
NewPaymentStatusId INT
)
Insert Dummy Data - 5 Commissions for 1 Dealer:
INSERT INTO dbo.Commission
( CommissionId ,
DealerId
)
VALUES ( 1 , 1 ),
( 2 , 1 ),
( 3 , 1 ),
( 4 , 1 ),
( 5 , 1 ),
INSERT INTO dbo.CommissionHistory
( CommissionId ,
ActionDate ,
NewPaymentStatusId
)
VALUES ( 1 , GETDATE() -25, 1 ),
( 1 , GETDATE() -21, 2 ),
( 1 , GETDATE() -18, 3 ),
( 1 , GETDATE() -16, 4 ),
( 1 , GETDATE() -5, 5 ),
( 2 , GETDATE() -10, 1 ),
( 2 , GETDATE() -9, 2 ),
( 2 , GETDATE() -8, 3 ),
( 2 , GETDATE() -7, 4 ),
( 2 , GETDATE() -6, 5 ),
( 3 , GETDATE() -10, 1 ),
( 3 , GETDATE() -8, 2 ),
( 3 , GETDATE() -6, 3 ),
( 3 , GETDATE() -4, 4 ),
( 3 , GETDATE() -2, 5 ),
( 3 , GETDATE() -25, 6 ),
( 4 , GETDATE() -10, 1 ),
( 4 , GETDATE() -7, 2 ),
( 4 , GETDATE() -6, 3 ),
( 4 , GETDATE() -4, 4 ),
( 4 , GETDATE() -1, 5 ),
( 5 , GETDATE() -1, 1 ),
( 5 , GETDATE() -1, 2 )
So with the dummy data, Commissions 1, 2 &, 4 are classified as valid records as they have status 2 and 5. 3 is excluded as it is refunded and 5 is excluded as it's not paid.
To generate the averages I wrote the below query:
-- set the required dealer id
DECLARE #DealerId INT = 1
-- return all CommissionId's in to a temp table that have statuses 2 and 5, but not 6
SELECT DISTINCT CommissionId
INTO #DealerCommissions
FROM dbo.CommissionHistory t1
WHERE CommissionId IN (SELECT CommissionId
FROM dbo.Commission
WHERE DealerId = #DealerId)
AND NOT EXISTS (SELECT CommissionId
FROM dbo.CommissionHistory t2
WHERE t2.NewPaymentStatusId = 6 AND t2.CommissionId = t1.CommissionId)
AND EXISTS (SELECT CommissionId
FROM dbo.CommissionHistory t2
WHERE t2.NewPaymentStatusId = 2 AND t2.CommissionId = t1.CommissionId)
AND EXISTS (SELECT CommissionId
FROM dbo.CommissionHistory t2
WHERE t2.NewPaymentStatusId = 5 AND t2.CommissionId = t1.CommissionId)
-- use the temp table to return average difference between the MIN & MAX date
;WITH cte AS (
SELECT CommissionId FROM #DealerCommissions
)
SELECT AVG(CAST(DaysToCompletion AS DECIMAL(10,8)))
FROM (
SELECT DATEDIFF(DAY, MIN(ch.ActionDate), MAX(ch.ActionDate)) DaysToCompletion
FROM cte
INNER JOIN dbo.CommissionHistory ch ON ch.CommissionId = cte.CommissionId
GROUP BY ch.CommissionId
) AS averageDays
-- remove temp table
DROP TABLE #DealerCommissions
For every commission in history table you could get the max verified date and min paid date, assuming paid date always later than verified date. Then you can join commission table to group by dealer id to get the average duration in days.
with comm as(
select
commissionid,
max(case NewPamentStatus when 'Verified' then ActionDate else null end) as verified_date,
min(case NewPamentStatus when 'Paid' then ActionDate else null end) as paid_date
--using max or min just incase that same status will be recorded more than one time.
from
CommissionHistory
group by
commistionid
)
select
c.DealerId,
avg(datediff(day,comm.verified_date,comm.paid_date))
from
comm
inner join
commission c
on c.commissionid = comm.commissionid
where
datediff(day,comm.verified_date,comm.paid_date)>0
-- to get rid off the commissions with paid date before the verified date or in same day
group by
c.DealerId

SQL query to calculate days worked per Month

Im stuck on a SQL query. Im using SQL Server.
Given a table that contains Jobs with a start and end date. These jobs can span days or months. I need to get the total combined number of days worked each month for all jobs that intersected those months.
Jobs
-----------------------------------
JobId | Start | End | DayRate |
-----------------------------------
1 | 1.1.13 | 2.2.13 | 2500 |
2 | 5.1.13 | 5.2.13 | 2000 |
3 | 3.3.13 | 2.4.13 | 3000 |
The results i need are:
Month | Days
--------------
Jan | 57
Feb | 7
Mar | 28
Apr | 2
Any idea how i would right such a query ?
I would also like to work out the SUM for each month based on multiplying the dayrate by number of days worked for each job, how would i add this to the results ?
Thanks
You can use recursive CTE to extract all days from start to end for each JobID and then just group by month (and year I guess).
;WITH CTE_TotalDays AS
(
SELECT [Start] AS DT, JobID FROM dbo.Jobs
UNION ALL
SELECT DATEADD(DD,1,c.DT), c.JobID FROM CTE_TotalDays c
WHERE c.DT < (SELECT [End] FROM Jobs j2 WHERE j2.JobId = c.JobID)
)
SELECT
MONTH(DT) AS [Month]
,YEAR(DT) AS [Year]
,COUNT(*) AS [Days]
FROM CTE_TotalDays
GROUP BY MONTH(DT),YEAR(DT)
OPTION (MAXRECURSION 0)
SQLFiddle DEMO
PS: There are 58 days in Jan in your example and not 57 ;)
You can do it using following approach:
/* Your table with periods */
declare #table table(JobId int, Start date, [End] date, DayRate money)
INSERT INTO #table (JobId , Start, [End], DayRate)
VALUES
(1, '20130101','20130202', 2500),
(2,'20130105','20130205', 2000),
(3,'20130303','20130402' , 3000 )
/* create table where stored all possible dates
if this code are supposed to be executed often you can create
table with dates ones to avoid overhead of filling it */
declare #dates table(d date)
declare #d date='20000101'
WHILE #d<'20500101'
BEGIN
INSERT INTO #dates (d) VALUES (#d)
SET #d=DATEADD(DAY,1,#d)
END;
/* and at last get desired output */
SELECT YEAR(d.d) [YEAR], DATENAME(month,d.d) [MONTH], COUNT(*) [Days]
FROM #dates d
CROSS JOIN #table t
WHERE d.d BETWEEN t.Start AND t.[End]
GROUP BY YEAR(d.d), DATENAME(month,d.d)
This only have 1 recursive call instead of 1 for each row. I imagine this will perform better than the chosen answer when you have large amount of data.
declare #t table(JobId int, Start date, [End] date, DayRate int)
insert #t values
(1,'2013-01-01','2013-02-02', 2500),(2,'2013-01-05','2013-02-05', 2000),(3,'2013-03-03', '2013-04-02',3000)
;WITH a AS
(
SELECT min(Start) s, max([End]) e
FROM #t
), b AS
(
SELECT s, e from a
UNION ALL
SELECT dateadd(day, 1, s), e
FROM b WHERE s <> e
)
SELECT
MONTH(b.s) AS [Month]
,YEAR(b.s) AS [Year]
,COUNT(*) AS [Days]
,SUM(DayRate) MonthDayRate
FROM b
join #t t
on b.s between t.Start and t.[End]
GROUP BY MONTH(b.s),YEAR(b.s)
OPTION (MAXRECURSION 0)
Result:
Month Year Days MonthDayRate
1 2013 58 131500
2 2013 7 15000
3 2013 29 87000
4 2013 2 6000

Get percentiles of data-set with group by month

I have a SQL table with a whole load of records that look like this:
| Date | Score |
+ -----------+-------+
| 01/01/2010 | 4 |
| 02/01/2010 | 6 |
| 03/01/2010 | 10 |
...
| 16/03/2010 | 2 |
I'm plotting this on a chart, so I get a nice line across the graph indicating score-over-time. Lovely.
Now, what I need to do is include the average score on the chart, so we can see how that changes over time, so I can simply add this to the mix:
SELECT
YEAR(SCOREDATE) 'Year', MONTH(SCOREDATE) 'Month',
MIN(SCORE) MinScore,
AVG(SCORE) AverageScore,
MAX(SCORE) MaxScore
FROM SCORES
GROUP BY YEAR(SCOREDATE), MONTH(SCOREDATE)
ORDER BY YEAR(SCOREDATE), MONTH(SCOREDATE)
That's no problem so far.
The problem is, how can I easily calculate the percentiles at each time-period? I'm not sure that's the correct phrase. What I need in total is:
A line on the chart for the score (easy)
A line on the chart for the average (easy)
A line on the chart showing the band that 95% of the scores occupy (stumped)
It's the third one that I don't get. I need to calculate the 5% percentile figures, which I can do singly:
SELECT MAX(SubQ.SCORE) FROM
(SELECT TOP 45 PERCENT SCORE
FROM SCORES
WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1
ORDER BY SCORE ASC) AS SubQ
SELECT MIN(SubQ.SCORE) FROM
(SELECT TOP 45 PERCENT SCORE
FROM SCORES
WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1
ORDER BY SCORE DESC) AS SubQ
But I can't work out how to get a table of all the months.
| Date | Average | 45% | 55% |
+ -----------+---------+-----+-----+
| 01/01/2010 | 13 | 11 | 15 |
| 02/01/2010 | 10 | 8 | 12 |
| 03/01/2010 | 5 | 4 | 10 |
...
| 16/03/2010 | 7 | 7 | 9 |
At the moment I'm going to have to load this lot up into my app, and calculate the figures myself. Or run a larger number of individual queries and collate the results.
Whew. This was a real brain teaser. First, my table schema for testing was:
Create Table Scores
(
Id int not null identity(1,1) primary key clustered
, [Date] datetime not null
, Score int not null
)
Now, first, I calculated the values using a CTE in SQL 2008 in order to check my answers and then I built a solution that should work in SQL 2000. So, in SQL 2008 we do something like:
;With
SummaryStatistics As
(
Select Year([Date]) As YearNum
, Month([Date]) As MonthNum
, Min(Score) As MinScore
, Max(Score) As MaxScore
, Avg(Score) As AvgScore
From Scores
Group By Month([Date]), Year([Date])
)
, Percentiles As
(
Select Year([Date]) As YearNum
, Month([Date]) As MonthNum
, Score
, NTile( 100 ) Over ( Partition By Month([Date]), Year([Date]) Order By Score ) As Percentile
From Scores
)
, ReportedPercentiles As
(
Select YearNum, MonthNum
, Min(Case When Percentile = 45 Then Score End) As Percentile45
, Min(Case When Percentile = 55 Then Score End) As Percentile55
From Percentiles
Where Percentile In(45,55)
Group By YearNum, MonthNum
)
Select SS.YearNum, SS.MonthNum
, SS.MinScore, SS.MaxScore, SS.AvgScore
, RP.Percentile45, RP.Percentile55
From SummaryStatistics As SS
Join ReportedPercentiles As RP
On RP.YearNum = SS.YearNum
And RP.MonthNum = SS.MonthNum
Order By SS.YearNum, SS.MonthNum
Now for a SQL 2000 solution. In essence, the trick is to use a couple of temporary tables to tally the occurances of the scores.
If object_id('tempdb..#Working') is not null
DROP TABLE #Working
GO
Create Table #Working
(
YearNum int not null
, MonthNum int not null
, Score int not null
, Occurances int not null
, Constraint PK_#Working Primary Key Clustered ( MonthNum, YearNum, Score )
)
GO
Insert #Working(MonthNum, YearNum, Score, Occurances)
Select Month([Date]), Year([Date]), Score, Count(*)
From Scores
Group By Month([Date]), Year([Date]), Score
GO
If object_id('tempdb..#SummaryStatistics') is not null
DROP TABLE #SummaryStatistics
GO
Create Table #SummaryStatistics
(
MonthNum int not null
, YearNum int not null
, Score int not null
, Occurances int not null
, CumulativeTotal int not null
, Percentile float null
, Constraint PK_#SummaryStatistics Primary Key Clustered ( MonthNum, YearNum, Score )
)
GO
Insert #SummaryStatistics(YearNum, MonthNum, Score, Occurances, CumulativeTotal)
Select W2.YearNum, W2.MonthNum, W2.Score, W2.Occurances, Sum(W1.Occurances)-W2.Occurances
From #Working As W1
Join #Working As W2
On W2.YearNum = W1.YearNum
And W2.MonthNum = W1.MonthNum
Where W1.Score <= W2.Score
Group By W2.YearNum, W2.MonthNum, W2.Score, W2.Occurances
Update #SummaryStatistics
Set Percentile = SS.CumulativeTotal * 100.0 / MonthTotal.Total
From #SummaryStatistics As SS
Join (
Select SS1.YearNum, SS1.MonthNum, Max(SS1.CumulativeTotal) As Total
From #SummaryStatistics As SS1
Group By SS1.YearNum, SS1.MonthNum
) As MonthTotal
On MonthTotal.YearNum = SS.YearNum
And MonthTotal.MonthNum = SS.MonthNum
Select GeneralStats.*, Percentiles.Percentile45, Percentiles.Percentile55
From (
Select Year(S1.[Date]) As YearNum
, Month(S1.[Date]) As MonthNum
, Min(S1.Score) As MinScore
, Max(S1.Score) As MaxScore
, Avg(S1.Score) As AvgScore
From Scores As S1
Group By Month(S1.[Date]), Year(S1.[Date])
) As GeneralStats
Join (
Select SS1.YearNum, SS1.MonthNum
, Min(Case When SS1.Percentile >= 45 Then Score End) As Percentile45
, Min(Case When SS1.Percentile >= 55 Then Score End) As Percentile55
From #SummaryStatistics As SS1
Group By SS1.YearNum, SS1.MonthNum
) As Percentiles
On Percentiles.YearNum = GeneralStats.YearNum
And Percentiles.MonthNum = GeneralStats.MonthNum
Without the data, I'm not sure if I'm doing this right, but maybe this will help get you there with two queries per year instead of 24...
SELECT MAX(SubQ.SCORE), MyMonth FROM
(SELECT TOP 45 PERCENT SCORE , MONTH(SCOREDATE) as MyMonth
FROM SCORES
WHERE YEAR(SCOREDATE) = 2010
ORDER BY SCORE ASC) AS SubQ
group by MyMonth