setting a flag for score change in SQL - sql

I have a table with exam scores for different weeks. I wanted to create an extra column with the score difference, like if score decreased by 0-5 then 1, 5-9 then 2, 10+ then 3 and if score increases then 4. Here is the sample data that I have with me in the table.
--DROP TABLE #Scores
CREATE TABLE #Scores (
NAME varchar(10),
Grade varchar(10),
Subject varchar(25),
Exam_Date datetime,
Score int
)
INSERT INTO #Scores
VALUES ('Sam', 'XI', 'Maths', '2016-08-01 15:47:29.533', 38),
('Sam', 'XI', 'Maths', '2016-07-25 15:47:29.533', 50),
('Mike', 'XI', 'Maths', '2016-08-01 15:47:29.533', 50),
('Mike', 'XI', 'Maths', '2016-07-25 15:47:29.533', 45)
SELECT * FROM #Scores
Thanks in adavance

You would use lag() and case:
select s.*,
(case when score - prev_score < 0 then 4
when score - prev_score <= 5 then 1
when score - prev_score <= 9 then 2
else 3
end) as score_diff
from (select s.*,
lag(score) over (partition by name, subject order by exam_date) as prev_score
from #scores s
) s;

Thanks to #Gordon Linoff, I change the code a little bit. The logic is right, just change the math a little.
select s.*,
(case when score - prev_score > 0 then 4
when score - prev_score between -5 and 0 then 1
when score - prev_score between -9 and -5 then 2
else 3
end) as score_diff
from (select s.*,
lag(score) over (partition by name, subject order by exam_date) as prev_score
from #scores s
) s;
Result is captured and shown below:

Consider a further step of normalization. Keep the scores in a separate table. Relate the student to the scores table.
You have to decide how you are going to reference the previous score to compare to the current. If you create an additional field to either store the change from last score then you can have a calculated field that shows the current score, or, store the previous score in a field along side the new score, then have a calculated field show the change between the two.

Related

Find date of most recent overdue

I have the following problem: from the table of pays and dues, I need to find the date of the last overdue. Here is the table and data for example:
create table t (
Id int
, [date] date
, Customer varchar(6)
, Deal varchar(6)
, Currency varchar(3)
, [Sum] int
);
insert into t values
(1, '2017-12-12', '1110', '111111', 'USD', 12000)
, (2, '2017-12-25', '1110', '111111', 'USD', 5000)
, (3, '2017-12-13', '1110', '122222', 'USD', 10000)
, (4, '2018-01-13', '1110', '111111', 'USD', -10100)
, (5, '2017-11-20', '2200', '222221', 'USD', 25000)
, (6, '2017-12-20', '2200', '222221', 'USD', 20000)
, (7, '2017-12-31', '2201', '222221', 'USD', -10000)
, (8, '2017-12-29', '1110', '122222', 'USD', -10000)
, (9, '2017-11-28', '2201', '222221', 'USD', -30000);
If the value of "Sum" is positive - it means overdue has begun; if "Sum" is negative - it means someone paid on this Deal.
In the example above on Deal '122222' overdue starts at 2017-12-13 and ends on 2017-12-29, so it shouldn't be in the result.
And for the Deal '222221' the first overdue of 25000 started at 2017-11-20 was completly paid at 2017-11-28, so the last date of current overdue (we are interested in) is 2017-12-31
I've made this selection to sum up all the payments, and stuck here :(
WITH cte AS (
SELECT *,
SUM([Sum]) OVER(PARTITION BY Deal ORDER BY [Date]) AS Debt_balance
FROM t
)
Apparently i need to find (for each Deal) minimum of Dates if there is no 0 or negative Debt_balance and the next date after the last 0 balance otherwise..
Will be gratefull for any tips and ideas on the subject.
Thanks!
UPDATE
My version of solution:
WITH cte AS (
SELECT ROW_NUMBER() OVER (ORDER BY Deal, [Date]) id,
Deal, [Date], [Sum],
SUM([Sum]) OVER(PARTITION BY Deal ORDER BY [Date]) AS Debt_balance
FROM t
)
SELECT a.Deal,
SUM(a.Sum) AS NET_Debt,
isnull(max(b.date), min(a.date)),
datediff(day, isnull(max(b.date), min(a.date)), getdate())
FROM cte as a
LEFT OUTER JOIN cte AS b
ON a.Deal = b.Deal AND a.Debt_balance <= 0 AND b.Id=a.Id+1
GROUP BY a.Deal
HAVING SUM(a.Sum) > 0
I believe you are trying to use running sum and keep track of when it changes to positive, and it can change to positive multiple times and you want the last date at which it became positive. You need LAG() in addition to running sum:
WITH cte1 AS (
-- running balance column
SELECT *
, SUM([Sum]) OVER (PARTITION BY Deal ORDER BY [Date], Id) AS RunningBalance
FROM t
), cte2 AS (
-- overdue begun column - set whenever running balance changes from l.t.e. zero to g.t. zero
SELECT *
, CASE WHEN LAG(RunningBalance, 1, 0) OVER (PARTITION BY Deal ORDER BY [Date], Id) <= 0 AND RunningBalance > 0 THEN 1 END AS OverdueBegun
FROM cte1
)
-- eliminate groups that are paid i.e. sum = 0
SELECT Deal, MAX(CASE WHEN OverdueBegun = 1 THEN [Date] END) AS RecentOverdueDate
FROM cte2
GROUP BY Deal
HAVING SUM([Sum]) <> 0
Demo on db<>fiddle
You can use window functions. These can calculate intermediate values:
Last day when the sum is negative (i.e. last "good" record).
Last sum
Then you can combine these:
select deal, min(date) as last_overdue_start_date
from (select t.*,
first_value(sum) over (partition by deal order by date desc) as last_sum,
max(case when sum < 0 then date end) over (partition by deal order by date) as max_date_neg
from t
) t
where last_sum > 0 and date > max_date_neg
group by deal;
Actually, the value on the last date is not necessary. So this simplifies to:
select deal, min(date) as last_overdue_start_date
from (select t.*,
max(case when sum < 0 then date end) over (partition by deal order by date) as max_date_neg
from t
) t
where date > max_date_neg
group by deal;

Splitting up group by with relevant aggregates beyond the basic ones?

I'm not sure if this has been asked before because I'm having trouble even asking it myself. I think the best way to explain my dilemma is to use an example.
Say I've rated my happiness on a scale of 1-10 every day for 10 years and I have the results in a big table where I have a single date correspond to a single integer value of my happiness rating. I say, though, that I only care about my happiness over 60 day periods on average (this may seem weird but this is a simplified example). So I wrap up this information to a table where I now have a start date field, an end date field, and an average rating field where the start days are every day from the first day to the last over all 10 years, but the end dates are exactly 60 days later. To be clear, these 60 day periods are overlapping (one would share 59 days with the next one, 58 with the next, and so on).
Next I pick a threshold rating, say 5, where I want to categorize everything below it into a "bad" category and everything above into a "good" category. I could easily add another field and use a case structure to give every 60-day range a "good" or "bad" flag.
Then to sum it up, I want to display the total periods of "good" and "bad" from maximum beginning to maximum end date. This is where I'm stuck. I could group by the good/bad category and then just take min(start date) and max(end date), but then if, say, the ranges go from good to bad to good then to bad again, output would show overlapping ranges of good and bad. In the aforementioned situation, I would want to show four different ranges.
I realize this may seem clearer to me that it would to someone else so if you need clarification just ask.
Thank you
---EDIT---
Here's an example of what the before would look like:
StartDate| EndDate| MoodRating
------------+------------+------------
1/1/1991 |3/1/1991 | 7
1/2/1991 |3/2/1991 | 7
1/3/1991 |3/3/1991 | 4
1/4/1991 |3/4/1991 | 4
1/5/1991 |3/5/1991 | 7
1/6/1991 |3/6/1991 | 7
1/7/1991 |3/7/1991 | 4
1/8/1991 |3/8/1991 | 4
1/9/1991 |3/9/1991 | 4
And the after:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/2/1991 |good
1/3/1991|3/4/1991 |bad
1/5/1991|3/6/1991 |good
1/7/1991|3/9/1991 |bad
Currently my query with the group by rating would show:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/6/1991 |good
1/3/1991|3/9/1991 |bad
This is something along the lines of
select min(StartDate), max(EndDate), Good_Bad
from sourcetable
group by Good_Bad
While Jason A Long's answer may be correct - I can't read it or figure it out, so I figured I would post my own answer. Assuming that this isn't a process that you're going to be constantly running, the CURSOR's performance hit shouldn't matter. But (at least to me) this solution is very readable and can be easily modified.
In a nutshell - we insert the first record from your source table into our results table. Next, we grab the next record and see if the mood score is the same as the previous record. If it is, we simply update the previous record's end date with the current record's end date (extending the range). If not, we insert a new record. Rinse, repeat. Simple.
Here is your setup and some sample data:
DECLARE #MoodRanges TABLE (StartDate DATE, EndDate DATE, MoodRating int)
INSERT INTO #MoodRanges
VALUES
('1/1/1991','3/1/1991', 7),
('1/2/1991','3/2/1991', 7),
('1/3/1991','3/3/1991', 4),
('1/4/1991','3/4/1991', 4),
('1/5/1991','3/5/1991', 7),
('1/6/1991','3/6/1991', 7),
('1/7/1991','3/7/1991', 4),
('1/8/1991','3/8/1991', 4),
('1/9/1991','3/9/1991', 4)
Next, we can create a table to store our results, as well as some variable placeholders for our cursor:
DECLARE #MoodResults TABLE(ID INT IDENTITY(1, 1), StartDate DATE, EndDate DATE, MoodScore varchar(50))
DECLARE #CurrentStartDate DATE, #CurrentEndDate DATE, #CurrentMoodScore INT,
#PreviousStartDate DATE, #PreviousEndDate DATE, #PreviousMoodScore INT
Now we put all of the sample data into our CURSOR:
DECLARE MoodCursor CURSOR FOR
SELECT StartDate, EndDate, MoodRating
FROM #MoodRanges
OPEN MoodCursor
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
WHILE ##FETCH_STATUS = 0
BEGIN
IF #PreviousStartDate IS NOT NULL
BEGIN
IF (#PreviousMoodScore >= 5 AND #CurrentMoodScore >= 5)
OR (#PreviousMoodScore < 5 AND #CurrentMoodScore < 5)
BEGIN
UPDATE #MoodResults
SET EndDate = #CurrentEndDate
WHERE ID = (SELECT MAX(ID) FROM #MoodResults)
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
SET #PreviousStartDate = #CurrentStartDate
SET #PreviousEndDate = #CurrentEndDate
SET #PreviousMoodScore = #CurrentMoodScore
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
END
CLOSE MoodCursor
DEALLOCATE MoodCursor
And here are the results:
SELECT * FROM #MoodResults
ID StartDate EndDate MoodScore
----------- ---------- ---------- --------------------------------------------------
1 1991-01-01 1991-03-02 GOOD
2 1991-01-03 1991-03-04 BAD
3 1991-01-05 1991-03-06 GOOD
4 1991-01-07 1991-03-09 BAD
Is this what you're looking for?
IF OBJECT_ID('tempdb..#MyDailyMood', 'U') IS NOT NULL
DROP TABLE #MyDailyMood;
CREATE TABLE #MyDailyMood (
TheDate DATE NOT NULL,
MoodLevel INT NOT NULL
);
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Calendar (dt) AS (
SELECT TOP (DATEDIFF(dd, '2007-01-01', '2017-01-01'))
DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, '2007-01-01')
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT #MyDailyMood (TheDate, MoodLevel)
SELECT
c.dt,
ABS(CHECKSUM(NEWID()) % 10) + 1
FROM
cte_Calendar c;
--==========================================================
WITH
cte_AddRN AS (
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS (
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
)
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup;
Post OP update solution...
WITH
cte_AddRN AS ( -- Add a row number to each row that resets to 1 ever 60 rows.
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS ( -- Use DENSE_RANK to create groups based on the RN added above.
-- How it works: RN set the row number 1 - 60 then repeats itself
-- but we dont want ever 60th row grouped together. We want blocks of 60 consecutive rows grouped together
-- DENSE_RANK accompolishes this by ranking within all the "1's", "2's"... and so on.
-- verify with the following query... SELECT * FROM cte_AssignGroups ag ORDER BY ag.TheDate
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
),
cte_AggRange AS ( -- This is just a straight forward aggregation/rollup. It produces the results similar to the sample data you posed in your edit.
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
GorB = CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END,
ag.DateGroup
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup
),
cte_CompactGroup AS ( -- This time we're using dense rank to group all of the consecutive "Good" and "Bad" values so that they can be further aggregated below.
SELECT
ar.BegOfRange, ar.EndOfRange, ar.AverageMoodLevel, ar.GorB, ar.DateGroup,
DenseGroup = ar.DateGroup - DENSE_RANK() OVER (PARTITION BY ar.GorB ORDER BY ar.BegOfRange)
FROM
cte_AggRange ar
)
-- The final aggregation step...
SELECT
BegOfRange = MIN(cg.BegOfRange),
EndOfRange = MAX(cg.EndOfRange),
cg.GorB
FROM
cte_CompactGroup cg
GROUP BY
cg.DenseGroup,
cg.GorB
ORDER BY
BegOfRange;

Count previous consecutive rows in SQL Server

I have attendance data list which is showing below. Now I am trying to find data by a specific date range (01/05/2016 – 07/05/2016) with total Present Column, Total Present Column will be calculated from previous present data (P). Suppose today is 04/05/2016. If a person has 01,02,03,04 status ‘p’ then it will show date 04-05-2016 total present 4.
Could you help me to find total present from this result set.
You can check this example, which have logic to calculate previous sum value.
declare #t table (employeeid int, datecol date, status varchar(2) )
insert into #t values (10001, '01-05-2016', 'P'),
(10001, '02-05-2016', 'P'),
(10001, '03-05-2016', 'P'),
(10001, '04-05-2016', 'P'),
(10001, '05-05-2016', 'A'),
(10001, '06-05-2016', 'P'),
(10001, '07-05-2016', 'P'),
(10001, '08-05-2016', 'L'),
(10002, '07-05-2016', 'P'),
(10002, '08-05-2016', 'L')
--select * from #t
select * ,
SUM(case when status = 'P' then 1 else 0 end) OVER (PARTITION BY employeeid ORDER BY employeeid, datecol
ROWS BETWEEN UNBOUNDED PRECEDING
AND current row)
from
#t
Another twist of the same thing via cte (as you written SQLSERVER2012, this below solution only work in Sqlserver 2012 and above)
;with cte as
(
select employeeid , datecol , ROW_NUMBER() over(partition by employeeid order by employeeid, datecol) rowno
from
#t where status = 'P'
)
select t.*, cte.rowno ,
case when ( isnull(cte.rowno, 0) = 0)
then LAG(cte.rowno) OVER (ORDER BY t.employeeid, t.datecol)
else cte.rowno
end LagValue
from #t t left join cte on t.employeeid = cte.employeeid and t.datecol = cte.datecol
order by t.employeeid, t.datecol
You could use a subquery to calculate TotalPresent for each row:
SELECT
main.EmployeeID,
main.[Date],
main.[Status],
(
SELECT SUM(CASE WHEN t.[Status] = 'P' THEN 1 ELSE 0 END)
FROM [TableName] t
WHERE t.EmployeeID = main.EmployeeID AND t.[Date] <= main.[Date]
) as TotalPresent
FROM [TableName] main
ORDER BY
main.EmployeeID,
main.[Date]
Here I used subquery to count the sum of records that have the same EmployeeID and date is less or equal to the date of current row. If status of the record is 'P', then 1 is added to the sum, otherwise 0, which counts only records that have status P.
Interesting question, this should work:
select *
, (select count(retail) from p g
where g.date <= p.date and g.id = p.id and retail = 'P')
from p
order by ID, Date;
So I believe I understand correctly. You would like to count the occurences of P per ID datewise.
This makes a lot of sense. That is why the first occurrence of ID2 was L and the Total is 0. This query will count P status for each occurrence, pause at non-P for each ID.
Here is an example

Histogram: Counting orders with variable bins in SQL

I have a table containing orders, items, and prices. I am trying to generate histograms for each item based on the prices.
Create Table #Customer_Pricing
(
customer_id int,
item_id VARCHAR(10),
qty DECIMAL(5,2),
price DECIMAL(5,2),
)
;
GO
-- Insert Statements
Insert into #Customer_Pricing values(128456, 'SOM 555', 8, 2.50)
Insert into #Customer_Pricing values(123856, 'SOM 554', 1, 2.50)
Insert into #Customer_Pricing values(123456, 'SOM 554', 55, 2.00)
Insert into #Customer_Pricing values(123556, 'SOM 555', 2, 2.20)
Insert into #Customer_Pricing values(123456, 'SOM 553', 12, 2.13)
;
For each item, I wanted 3 bins so I determined the bin sizes by dividing the difference of the MAX-MIN by 3, then adding that value to the MIN.
WITH Stats_Table_CTE (item_id2,max_p, min_p, int_p, r1_upper, r2_lower, r2_upper, r3_lower)
AS
( SELECT item_id
,max(price)
,min(price)
,(max(price) - min(price))/3
,min(price)+(max(price) - min(price))/3-0.01
,min(price)+(max(price) - min(price))/3
,min(price)+((max(price) - min(price))/3)*2-0.01
,min(price)+((max(price) - min(price))/3)*2
FROM #Customer_Pricing
GROUP BY item_id)
Now, I need to count the frequencies for each range and each item. I have attempted to do so by using SUM(CASE...) but was unsuccessful.
SELECT item_id
,SUM(CASE WHEN price <= r1_upper, THEN 1 ELSE 0 END) AS r1_count
,SUM(CASE WHEN price >= r2_lower AND <= r2_upper, THEN 1 ELSE 0 END) AS r2_count
,SUM(CASE WHEN price >= r3_lower, THEN 1 ELSE 0 END) AS r3_count
FROM Stats_Table_CTE
GROUP BY item_id
I also attempted to use COUNT in the form
SELECT item_id, price
count(price <= r1_upper) AS r1_count.... but I got stuck
In one attempt, INNER JOINed the #Customer_Pricing table and Stats_Table_CTE but didn't know where to go from there.
Ideally, I would like the output table to appear as follows: *This is not the actual data, but I included it to show the desired format of the output.
Item ID min_p r1_upper (r2 bins) r3_lower max_p r1_count r2_ct
SOM 553 2.00 2.16 saving space 2.33 2.50 2 1
SOM 554 2.13 2.48 2.88 3.25 1 0
SOM 555 2.31 2.51 2.72 2.92 3 2
*The format of the output table is off, but I have item ID, the bins, and the counts across the top grouped by item
Here is my recommendation:
WITH Stats_Table_CTE AS (
SELECT item_id, max(price) as maxprice, min(price) as minprice,
(max(price) - min(price))/3 as binsize
FROM #Customer_Pricing
GROUP BY item_id
)
SELECT cp.item_id,
SUM(CASE WHEN price < minprice + binsize THEN 1 ELSE 0
END) AS r1_count
SUM(CASE WHEN price >= minprice + binsize AND price < minprice+ 2*binsize
THEN 1 ELSE 0
END) AS r2_count
SUM(CASE WHEN price >= minprice + 2*binsize
THEN 1 ELSE 0
END) AS r3_count
FROM #Customer_Pricing cp JOIN
Stats_Table_CTE st
ON st.item_id = cp.item_id
GROUP BY cp.item_id
The important part is the join back to #Customer_Pricing. Also important is the simplification of the logic -- you can define the bounds for the bins and use <, rather than having a lower and upper bound for each one. Also, your query had some syntax errors in it.
Note that in many databases, the CTE would not be necessary because you could just use window functions. Your question is not tagged with the database (although I could guess what it is), so that change seems unwarranted.

Count multiple repeats after event as single repeat

What I'm trying to do in come up with a single query that can give the percentage of repeats within 30 days of an initial event, but only count any events within 30 days as a single repeat. Here's a sample data set for a single person:
Person Date
══════════════
A 3/1/14
A 3/21/14
A 3/29/14
A 4/14/14
A 4/17/14
In this case, 3/21 would be the repeat event, and 3/29 wouldn't be counted as a second. 4/14 would be the start of the next window, with 4/17 being the second repeat.
To calculate the percentage of repeats here, the numerator would be the distinct count of people who had an initial event in the month and also had a subsequent event within 30 days. The denominator is a distinct count of people with events in that month. In the case of crossing months, the repeat is counted within the month of the initial event.
I know I could come up with something that uses a loop/cursor or temp table, but as the data set grows, it's going to take forever. Does anyone have any thoughts on how to do this as a single query? It's probably going to involve a couple of CTE's. Everything I've come up with so far has failed.
Nice one... try this:
create table #t (Person varchar(10), EventDate date);
insert #t (Person, EventDate)
values
('A', '3/1/14'),
('A', '3/21/14'),
('A', '3/29/14'),
('A', '4/14/14'),
('A', '4/17/14'),
('A', '8/3/14'),
('B', '3/25/14'),
('B', '4/2/14'),
('B', '4/20/14'),
('B', '6/14/14'),
('B', '8/17/14'),
('B', '8/26/14');
;WITH OrderedEvents AS (
SELECT Person, EventDate, ROW_NUMBER() OVER (PARTITION BY Person ORDER BY EventDate) AS Ord
FROM #t
)
, RepeatedEvents AS (
SELECT Person, EventDate, Ord, EventDate AS InitialDate
FROM OrderedEvents
WHERE Ord = 1
UNION ALL
SELECT o.Person, o.EventDate, o.Ord
, CASE WHEN DATEDIFF(DAY, r.InitialDate, o.EventDate) > 30 THEN o.EventDate ELSE r.InitialDate END
FROM OrderedEvents o
JOIN RepeatedEvents r ON o.Person = r.Person AND o.Ord = r.Ord + 1
)
, GroupedEvents AS (
SELECT Person, MONTH(InitialDate) AS Mth, YEAR(InitialDate) AS Yr
, IsRepeat = CASE WHEN COUNT(*) > 1 THEN 1 ELSE 0 END
FROM RepeatedEvents
GROUP BY Person, MONTH(InitialDate), YEAR(InitialDate)
)
SELECT Mth, Yr, CAST(SUM(IsRepeat) AS NUMERIC) / CAST(COUNT(DISTINCT person) AS NUMERIC) AS Pct
FROM GroupedEvents
GROUP BY Mth, Yr;