I'm trying to use row_number to calculate median, lower quartile, and upper quartile for a box plot chart. However, my row_number sort is off because of ties.
Here is some sample data:
CREATE TABLE EStats
(
PersonID VARCHAR(30) NOT NULL,
Grade VARCHAR(25) NOT NULL,
CourseDate Date NOT NULL
);
INSERT INTO EStats
(
PersonID, Grade, CourseDate
)
VALUES
('100', '91', '2010-03-01'),
('101', '96', '2010-03-01'),
('102', '88', '2010-03-01'),
('103', '92', '2010-03-01'),
('104', '81', '2010-03-01'),
('105', '85', '2010-03-01'),
('106', '91', '2010-03-01'),
('107', '89', '2010-03-01'),
('108', '99', '2010-03-01'),
('109', '88', '2010-03-01'),
('110', '81', '2011-03-02'),
('111', '77', '2011-03-02'),
('112', '88', '2011-03-02'),
('113', '76', '2011-03-02'),
('114', '69', '2011-03-02'),
('115', '70', '2011-03-02'),
('116', '75', '2011-03-02'),
('117', '88', '2011-03-02'),
('118', '76', '2011-03-02'),
('119', '95', '2012-03-01'),
('120', '96', '2012-03-01'),
('121', '90', '2012-03-01'),
('122', '80', '2012-03-01'),
('123', '85', '2012-03-01'),
('124', '94', '2012-03-01'),
('125', '89', '2012-03-01'),
('126', '97', '2012-03-01'),
('127', '94', '2012-03-01'),
('128', '72', '2012-03-01'),
('129', '88', '2012-03-01'),
('130', '91', '2012-03-01')
Here is one of my inner queries that shows the sort not working:
SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats
Notice that for CourseDate 2010-03-01 the RowAsc does this:
10
9
8
6
7
5
3
4
2
1
However, I need all of the rows to have a number in sequential order so that I can calculate median in the case where an even amount of numbers exists. (Rank and dense_rank don't work because of the "gaps" they leave).
Actually, below is the entire thing. Again, I'm trying to calculate median, lower quartile, upper quartile, min, and max for a blox plot chart. ANY help is really appreciated!
WITH Q3 AS
(
SELECT
CourseDate,
AVG(CAST(Grade AS Numeric)) AS Median
FROM
(
SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CourseDate
--ORDER BY CourseDate
),
Q2 AS
(
SELECT
x.CourseDate,
AVG(CAST(Grade AS Numeric)) AS LowerQuartile
FROM
(
SELECT
Estats.CourseDate,
Estats.Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(EStats.CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(Estats.CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
WHERE EStats.Grade < Q3.Median
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.CourseDate
),
Q4 AS
(
SELECT
x.CourseDate,
AVG(CAST(Grade AS Numeric)) AS UpperQuartile
FROM
(
SELECT
Estats.CourseDate,
Estats.Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(EStats.CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(Estats.CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
WHERE EStats.Grade > Q3.Median
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.CourseDate
)
SELECT Q3.CourseDate, Q3.Median AS Median, Q2.LowerQuartile, Q4.UpperQuartile, MIN(EStats.Grade) AS Min, MAX(EStats.Grade) AS Max
FROM Q3
JOIN Q2 ON Q3.CourseDate = Q2.CourseDate
JOIN Q4 ON Q3.CourseDate = Q4.CourseDate
JOIN EStats ON Q3.CourseDate = EStats.CourseDate
GROUP BY Q3.CourseDate, Q3.Median, Q2.LowerQuartile, Q4.UpperQuartile
ORDER BY Q3.CourseDate
Try this to get the median:
select avg(case when seqnum*2 = totnum+1 then col
when seqnum*2 in (totnum, totnum + 2) then col
end)
from (select t.*, row_number() over (order by col) as seqnum,
count(*) over () as totnum
from t
) t
It looks arcane, but the idea is to do the average that you want for the even numbers and the single value for the others. If using SQL Server, recall that it uses integer division. You can actually simplify the above to:
select avg(case when seqnum*2 in (totnum, totnum+1, totnum+2) then col end)
This works because an odd total cnt just matches totnum+1 and the even matches the other two values.
Related
I have calculated average values for each month. Some months are NULL and my manager wants me to use the previous row's value and following month's value and fill the months which are having NULL values.
Current result (see below pic):
Expected Result
DECLARE #DATE DATE = '2017-01-01';
WITH DATEDIM AS
(
SELECT DISTINCT DTM.FirstDayOfMonth
FROM DATEDIM DTM
WHERE Date >= '01/01/2017'
AND Date <= DATEADD(mm,-1,Getdate())
),
Tab1 AS
(
SELECT
T1.FirstDayOfMonth AS MONTH_START,
AVG1,
ROW_NUMBER() OVER (
ORDER BY DATEADD(MM,DATEDIFF(MM, 0, T1.FirstDayOfMonth),0) DESC
) AS RNK
FROM DATEDIM T1
LEFT OUTER JOIN (
SELECT
DATEADD(MM,DATEDIFF(MM, 0, StartDate),0) MONTH_START,
AVG(CAST(DATEDIFF(dd, StartDate, EndDate) AS FLOAT)) AS AVG1
FROM DATATable
WHERE EndDate >= StartDate
AND StartDate >= #DATE
AND EndDate >= #DATE
GROUP BY DATEADD(MM,DATEDIFF(MM, 0, StartDate),0)
) T2 ON T1.FirstDayOfMonth = T2.MONTH_START
)
SELECT *
FROM Tab1
Using your CTEs
select MONTH_START,
case when AVG1 is null then
(select top(1) t2.AVG1
from Tab1 t2
where t1.RNK > t2.RNK and t2.AVG1 is not null
order by t2.RNK desc)
else AVG1 end AVG1,
RNK
from Tab1 t1
Edit
Version for an average of nearest peceding and nearest following non-nulls. Both must exist otherwise NULL is returned.
select MONTH_START,
case when AVG1 is null then
( (select top(1) t2.AVG1
from Tab1 t2
where t1.RNK > t2.RNK and t2.AVG1 is not null
order by t2.RNK desc)
+(select top(1) t2.AVG1
from Tab1 t2
where t1.RNK < t2.RNK and t2.AVG1 is not null
order by t2.RNK)
) / 2
else AVG1 end AVG1,
RNK
from Tab1 t1
I can't quite tell what you are trying to calculate the average of, but this is quite simple with window functions:
select t.*,
avg(val) over (order by month_start rows between 1 preceding and 1 rollowing)
from t;
In your case, I think this translates as:
select datefromparts(year(startdate), month(startdate), 1) as float,
avg(val) as monthaverage,
avg(avg(val)) over (order by min(startdate) rows between 1 preceding and 1 following)
from datatable d
where . . .
group by datefromparts(year(startdate), month(startdate), 1)
You can manipulate previous and following row values using window functions:
SELECT MAX(row_value) OVER(
ORDER BY ... ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS Previous_Value,
MAX(row_value) OVER(
ORDER BY ... ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS Next_Value
Alternatively you can use LAG/LEAD functions and modify your sub-query where you get the AVG:
SELECT
src.MONTH_START,
CASE
WHEN src.prev_val IS NULL OR src.next_val IS NULL
THEN COALESCE(src.prev_val, src.next_val) -- Return non-NULL value (if exists)
ELSE (src.prev_val + src.next_val ) / 2
END AS AVG_new
FROM (
SELECT
DATEADD(MM,DATEDIFF(MM, 0, StartDate),0) MONTH_START,
LEAD(CAST(DATEDIFF(dd, StartDate, EndDate) AS FLOAT)) OVER(ORDER BY ...) AS prev_val,
LAG(CAST(DATEDIFF(dd, StartDate, EndDate) AS FLOAT)) OVER(ORDER BY ...) AS next_val
-- AVG(CAST(DATEDIFF(dd, StartDate, EndDate) AS FLOAT)) AS AVG1
FROM DATATable
WHERE EndDate >= StartDate
AND StartDate >= #DATE
AND EndDate >= #DATE
GROUP BY DATEADD(MM,DATEDIFF(MM, 0, StartDate),0)
) AS src
I haven't tested it, but give it a shot and see how it works. You may need to put at least one column in the ORDER BY portion of the window function.
You could try this query (I just reflected in my sample data relevant parts, I omitted date column):
declare #tbl table (rank int, value int);
insert into #tbl values
(1, null),
(2, 20),
(3, 30),
(4, null),
(5, null),
(6, null),
(7, 40),
(8, null),
(9, null),
(10, 36),
(11, 22);
;with cte as (
select *,
DENSE_RANK() over (order by case when value is null then rank else value end) drank,
case when value is null then lag(value) over (order by rank) end lag,
case when value is null then lead(value) over (order by rank) end lead
from #tbl
)
select rank, value, case when value is null then
max(lag) over (partition by grp) / 2 +
max(lead) over (partition by grp) / 2
else value end valueWithAvg
from (
select *,
rank - drank grp from cte
) a order by rank
I have the following problem: from the table of pays and dues, I need to find the date of the last overdue. Here is the table and data for example:
create table t (
Id int
, [date] date
, Customer varchar(6)
, Deal varchar(6)
, Currency varchar(3)
, [Sum] int
);
insert into t values
(1, '2017-12-12', '1110', '111111', 'USD', 12000)
, (2, '2017-12-25', '1110', '111111', 'USD', 5000)
, (3, '2017-12-13', '1110', '122222', 'USD', 10000)
, (4, '2018-01-13', '1110', '111111', 'USD', -10100)
, (5, '2017-11-20', '2200', '222221', 'USD', 25000)
, (6, '2017-12-20', '2200', '222221', 'USD', 20000)
, (7, '2017-12-31', '2201', '222221', 'USD', -10000)
, (8, '2017-12-29', '1110', '122222', 'USD', -10000)
, (9, '2017-11-28', '2201', '222221', 'USD', -30000);
If the value of "Sum" is positive - it means overdue has begun; if "Sum" is negative - it means someone paid on this Deal.
In the example above on Deal '122222' overdue starts at 2017-12-13 and ends on 2017-12-29, so it shouldn't be in the result.
And for the Deal '222221' the first overdue of 25000 started at 2017-11-20 was completly paid at 2017-11-28, so the last date of current overdue (we are interested in) is 2017-12-31
I've made this selection to sum up all the payments, and stuck here :(
WITH cte AS (
SELECT *,
SUM([Sum]) OVER(PARTITION BY Deal ORDER BY [Date]) AS Debt_balance
FROM t
)
Apparently i need to find (for each Deal) minimum of Dates if there is no 0 or negative Debt_balance and the next date after the last 0 balance otherwise..
Will be gratefull for any tips and ideas on the subject.
Thanks!
UPDATE
My version of solution:
WITH cte AS (
SELECT ROW_NUMBER() OVER (ORDER BY Deal, [Date]) id,
Deal, [Date], [Sum],
SUM([Sum]) OVER(PARTITION BY Deal ORDER BY [Date]) AS Debt_balance
FROM t
)
SELECT a.Deal,
SUM(a.Sum) AS NET_Debt,
isnull(max(b.date), min(a.date)),
datediff(day, isnull(max(b.date), min(a.date)), getdate())
FROM cte as a
LEFT OUTER JOIN cte AS b
ON a.Deal = b.Deal AND a.Debt_balance <= 0 AND b.Id=a.Id+1
GROUP BY a.Deal
HAVING SUM(a.Sum) > 0
I believe you are trying to use running sum and keep track of when it changes to positive, and it can change to positive multiple times and you want the last date at which it became positive. You need LAG() in addition to running sum:
WITH cte1 AS (
-- running balance column
SELECT *
, SUM([Sum]) OVER (PARTITION BY Deal ORDER BY [Date], Id) AS RunningBalance
FROM t
), cte2 AS (
-- overdue begun column - set whenever running balance changes from l.t.e. zero to g.t. zero
SELECT *
, CASE WHEN LAG(RunningBalance, 1, 0) OVER (PARTITION BY Deal ORDER BY [Date], Id) <= 0 AND RunningBalance > 0 THEN 1 END AS OverdueBegun
FROM cte1
)
-- eliminate groups that are paid i.e. sum = 0
SELECT Deal, MAX(CASE WHEN OverdueBegun = 1 THEN [Date] END) AS RecentOverdueDate
FROM cte2
GROUP BY Deal
HAVING SUM([Sum]) <> 0
Demo on db<>fiddle
You can use window functions. These can calculate intermediate values:
Last day when the sum is negative (i.e. last "good" record).
Last sum
Then you can combine these:
select deal, min(date) as last_overdue_start_date
from (select t.*,
first_value(sum) over (partition by deal order by date desc) as last_sum,
max(case when sum < 0 then date end) over (partition by deal order by date) as max_date_neg
from t
) t
where last_sum > 0 and date > max_date_neg
group by deal;
Actually, the value on the last date is not necessary. So this simplifies to:
select deal, min(date) as last_overdue_start_date
from (select t.*,
max(case when sum < 0 then date end) over (partition by deal order by date) as max_date_neg
from t
) t
where date > max_date_neg
group by deal;
How do I get the following result highlighted in yellow?
Essentially I want a calculated field which increments by 1 when VeganOption = 1 and is zero when VeganOption = 0
I have tried using the following query but using partition continues to increment after a zero. I'm a bit stuck on this one.
SELECT [UniqueId]
,[Meal]
,[VDate]
,[VeganOption]
, row_number() over (partition by [VeganOption] order by [UniqueId])
FROM [Control]
order by [UniqueId]
Table Data:
CREATE TABLE Control
([UniqueId] int, [Meal] varchar(10), [VDate] datetime, [VeganOption] int);
INSERT INTO Control ([UniqueId], [Meal], [VDate], [VeganOption])
VALUES
('1', 'Breakfast',' 2018-08-01 00:00:00', 1),
('2', 'Lunch',' 2018-08-01 00:00:00', 1),
('3', 'Dinner',' 2018-08-01 00:00:00', 1),
('4', 'Breakfast',' 2018-08-02 00:00:00', 1),
('5', 'Lunch',' 2018-08-02 00:00:00', 0),
('6', 'Dinner',' 2018-08-02 00:00:00', 0),
('7', 'Breakfast',' 2018-08-03 00:00:00', 1),
('8', 'Lunch',' 2018-08-03 00:00:00', 1),
('9', 'Dinner',' 2018-08-03 00:00:00', 1),
('10', 'Breakfast',' 2018-08-04 00:00:00', 0),
('11', 'Lunch',' 2018-08-04 00:00:00', 1),
('12', 'Dinner',' 2018-08-04 00:00:00', 1)
;
This is for SQL Server 2016+
You could create subgroups using SUM and then ROW_NUMBER:
WITH cte AS (
SELECT [UniqueId]
,[Meal]
,[VDate]
,[VeganOption]
,sum(CASE WHEN VeganOption = 1 THEN 0 ELSE 1 END)
over (order by [UniqueId]) AS grp --switching 0 <-> 1
FROM [Control]
)
SELECT *,CASE WHEN VeganOption =0 THEN 0
ELSE ROW_NUMBER() OVER(PARTITION BY veganOption, grp ORDER BY [UniqueId])
END AS VeganStreak -- main group and calculated subgroup
FROM cte
order by [UniqueId];
Rextester Demo
This is a variant on gaps-and-islands.
I like to define streaks using the difference of row numbers. This looks like
select c.*,
(case when veganoption = 1
then row_number() over (partition by veganoption, seqnum - seqnum_v order by uniqueid)
else 0
end) as veganstreak
from (select c.*,
row_number() over (partition by veganoption order by uniqueid) as seqnum_v,
row_number() over (order by uniqueid) as seqnum
from c
) c;
Why this works is a bit hard to explain. But, if you look at the results of the subquery, you'll see how the difference of row numbers defines the streaks you want to identify. The rest is just applying row_number() to enumerate the meals.
Here is a Rextester.
One method is to use a CTE to define your groupings, and then do a further ROW_NUMBER() on those, resulting in:
WITH Grps AS(
SELECT *,
ROW_NUMBER() OVER (ORDER BY UniqueID ASC) -
ROW_NUMBER() OVER (PARTITION BY VeganOption ORDER BY UniqueID ASC) AS Grp
FROM Control)
SELECT *,
CASE VeganOption WHEN 0 THEN 0 ELSE ROW_NUMBER() OVER (PARTITION BY Grp ORDER BY UniqueID ASC) END
FROM Grps
ORDER BY UniqueId;
Test data:
CREATE TABLE #Products
(Product VARCHAR(100), BeginDate DATETIME, EndDate DATETIME NULL, Rate INT);
INSERT INTO #Products (Product, BeginDate, EndDate, Rate)
VALUES ('Football', '01-01-1982', '05-03-2011', 2),
('Football', '05-04-2011', '08-01-2012', 1),
('Football', '08-02-2012', '01-01-2013', 2),
('Football', '01-02-2013', NULL, 3),
('Eggs', '01-01-1982', '05-03-2011', 1),
('Eggs', '05-04-2011', '08-01-2012', 1),
('Eggs', '08-02-2012', NULL, 1),
('Potato', '01-01-1982', '05-03-2011', 1),
('Potato', '05-04-2011', '08-01-2012', 1),
('Potato', '08-02-2012', '08-01-2013', 2),
('Potato', '08-02-2013', '08-01-2014', 2),
('Potato', '08-02-2014', '08-01-2015', 3),
('Potato', '08-02-2015', NULL, 3);
Expected result:
CREATE TABLE #Results
(Product VARCHAR(100), BeginDate DATETIME, EndDate DATETIME NULL, Rate INT);
INSERT INTO #Results (Product, BeginDate, EndDate, Rate)
VALUES ('Football', '01-01-1982', '05-03-2011', 2),
('Football', '05-04-2011', '08-01-2012', 1),
('Football', '08-02-2012', '01-01-2013', 2),
('Football', '01-02-2013', NULL, 3),
('Eggs', '01-01-1982', NULL, 1),
('Potato', '01-01-1982', '08-01-2012', 1),
('Potato', '08-02-2012', '08-01-2014', 2),
('Potato', '08-02-2014', NULL, 3);
I want to group by product and rate column, but skip grouping if rate change isn't continuous. for instance the case of football in the given test data. In case of football although there are two rows with Rate of 2, it shouldn't be grouped because there was a different rate for a time period. The BeginDate value will always be 1 day ahead of previous EndDate.
I tried group by but that didn't work.
This is an islands problem, one possible solution
SELECT Product, min(BeginDate), EndDate, rate
FROM (
SELECT Product, BeginDate, rate
,last_value(EndDate) over(partition by Product, Rate order by BeginDate
rows between unbounded preceding and unbounded following) EndDate
,row_number() over(partition by Product order by BeginDate) - row_number() over(partition by Product, Rate order by BeginDate) grp
FROM #Products
) t
GROUP BY Product, grp, EndDate, rate
ORDER BY Product, min(BeginDate)
Result
Product (No column name) EndDate rate
Eggs 01.01.1982 00:00:00 NULL 1
Football 01.01.1982 00:00:00 01.01.2013 00:00:00 2
Football 04.05.2011 00:00:00 01.08.2012 00:00:00 1
Football 02.08.2012 00:00:00 01.01.2013 00:00:00 2
Football 02.01.2013 00:00:00 NULL 3
Potato 01.01.1982 00:00:00 01.08.2012 00:00:00 1
Potato 02.08.2012 00:00:00 01.08.2014 00:00:00 2
Potato 02.08.2014 00:00:00 NULL 3
You can use lag to get the previous row's endDate and Rate and use a case expression to start a new group when the specified conditions aren't met. Use sum() over() to assign groups. Thereafter, you can use first_value window function to get the first beginDate, last endDate and the rate per product,group.
select distinct product,
first_value(begindate) over(partition by product,grp order by beginDate),
first_value(enddate) over(partition by product,grp order by beginDate desc),
max(rate) over(partition by product,grp)
from
(select p.*,
sum(case when datediff(day,prevEnd,beginDate)=1 and prevRate=Rate then 0 else 1 end)
over(partition by product order by beginDate) as grp
from
(select p.*,
lag(endDate,1,endDate) over(partition by product order by beginDate) as prevEnd,
lag(Rate,1,Rate) over(partition by product order by beginDate) as prevRate
from #Products p
) p
) p
Sample Demo
You can use Row_Number and query as below:
Select top (1) with ties * from (
SElect *, RowN = Row_number() over (partition by Product order by begindate) - Row_number() over (partition by product,rate order by begindate)
from #Products
) a order by row_number() over(partition by Product, Rate, RowN order by BeginDate)
I think this does it
select *
from ( select *
, lag(Rate, 1) over(partition by product order by beginDate) as prevRate
, lag(Product, 1) over(partition by product order by beginDate) as prevProduct
from #Products
) lag
where ( rate <> prevRate or prevRate is null ) and product = isnull(prevProduct, product)
order by Product, BeginDate
I'd like a running distinct count with a partition by year for the following data:
DROP TABLE IF EXISTS #FACT;
CREATE TABLE #FACT("Year" INT,"Month" INT, "Acc" varchar(5));
INSERT INTO #FACT
values
(2015, 1, 'A'),
(2015, 1, 'B'),
(2015, 1, 'B'),
(2015, 1, 'C'),
(2015, 2, 'D'),
(2015, 2, 'E'),
(2015, 3, 'E'),
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 2, 'B'),
(2016, 2, 'C');
SELECT * FROM #FACT;
The following returns the correct answer but is there a more concise way that is also performant?
WITH
dnsRnk AS
(
SELECT
"Year"
, "Month"
, DenseR = DENSE_RANK() OVER(PARTITION BY "Year", "Month" ORDER BY "Acc")
FROM #FACT
),
mxPerMth AS
(
SELECT
"Year"
, "Month"
, RunningTotal = MAX(DenseR)
FROM dnsRnk
GROUP BY
"Year"
, "Month"
)
SELECT
"Year"
, "Month"
, X = SUM(RunningTotal) OVER (PARTITION BY "Year" ORDER BY "Month")
FROM mxPerMth
ORDER BY
"Year"
, "Month";
The above returns the following - the answer should also return exactly the same table:
If you want a running count of distinct accounts:
SELECT f.*,
sum(case when seqnum = 1 then 1 else 0 end) over (partition by year order by month) as cume_distinct_acc
FROM (
SELECT
f.*
,row_number() over (partition by account order by year, month) as seqnum
FROM #fact f
) f;
This counts each account during the first month when it appears.
EDIT:
Oops. The above doesn't aggregate by year and month and then start over for each year. Here is the correct solution:
SELECT
year
,month
,sum( sum(case when seqnum = 1 then 1 else 0 end)
) over (partition by year order by month) as cume_distinct_acc
FROM (
SELECT
f.*
,row_number() over (partition by account, year order by month) as seqnum
FROM #fact f
) f
group by year, month
order by year, month;
And, SQL Fiddle isn't working but the following is an example:
with FACT as (
SELECT yyyy, mm, account
FROM (values
(2015, 1, 'A'),
(2015, 1, 'B'),
(2015, 1, 'B'),
(2015, 1, 'C'),
(2015, 2, 'D'),
(2015, 2, 'E'),
(2015, 3, 'E'),
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 2, 'B'),
(2016, 2, 'C')) v(yyyy, mm, account)
)
SELECT
yyyy
,mm
,sum(sum(case when seqnum = 1 then 1 else 0 end)) over (partition by yyyy order by mm) as cume_distinct_acc
FROM (
SELECT
f.*
,row_number() over (partition by account, yyyy order by mm) as seqnum
FROM fact f
) f
group by yyyy, mm
order by yyyy, mm;
Demo Here:
;with cte as (
SELECT yearr, monthh, count(distinct acc) as cnt
FROM #fact
GROUP BY yearr, monthh
)
SELECT
yearr
,monthh
,sum(cnt) over (Partition by yearr order by yearr, monthh rows unbounded preceding ) as x
FROM cte