Running Sum that resets to 0 on each new cluster of consecutives - sql

I have tried and failed to adapt several running sum methods (remember I have to use SQL Server 2008, so it's a bit trickier than in 2012).
The goal is to have a running sum of Amount ordered by Date. Any time Category field changes value during that list, the sum should restart.
Table structure:
[Date], [Category], [Amount]
Example:
[Date], [Category], [Amount], [RunSumReset]
-------------------------------------------
1-Jan, catA, 10, 10
2-Jan, catA, 5, 15
3-Jan, catA, 15, 30
15-Jan, catB, 3, 3
1-Feb, catB, 6, 9
11-Feb, catA, 10, 10
12-Feb, catC, 2, 2
1-Apr, catA, 5, 5
Thanks so much for any slick tips or tricks

Using Version 2008 makes things a bit trickier since the window version of SUM with ORDER BY clause is not available.
One way to do it is:
WITH CTE AS (
SELECT [Date], Category, Amount,
ROW_NUMBER() OVER (ORDER BY [Date]) -
ROW_NUMBER() OVER (PARTITION BY Category
ORDER BY [Date]) AS grp
FROM mytable
)
SELECT [Date], Category, Amount, Amount + COALESCE(t.s, 0) AS RunSumReset
FROM CTE AS c1
OUTER APPLY (
SELECT SUM(c2.Amount)
FROM CTE AS c2
WHERE c2.[Date] < c1.[Date] AND
c1.Category = c2.Category AND
c1.grp = c2.grp) AS t(s)
ORDER BY [Date]
The CTE is used to calculate field grp that identifies islands of consecutive records having the same Category. Once Category changes, grp value also changes. Using this CTE we can calculate the running total the way it is normally done in versions prior to SQL Server 2012, i.e. using OUTER APPLY.

Select sum of amounts in current row and up to first row that has different category. In your case you will need to replace NULL with some min date that SQL Server supports, like '17530101':
DECLARE #t TABLE
(
category INT ,
amount INT ,
ordering INT
)
INSERT INTO #t
VALUES ( 1, 1, 1 ),
( 1, 2, 2 ),
( 1, 3, 3 ),
( 2, 4, 4 ),
( 2, 5, 5 ),
( 3, 6, 6 ),
( 1, 7, 7 ),
( 1, 8, 8 ),
( 4, 9, 9 ),
( 1, 10, 10 )
SELECT category ,
amount ,
( SELECT SUM(amount)
FROM #t
WHERE category = t.category
AND ordering <= t.ordering
AND ordering > ( SELECT ISNULL(MAX(ordering), 0)
FROM #t
WHERE category <> t.category
AND ordering < t.ordering
)
) AS sum
FROM #t t
ORDER BY t.ordering
Output:
category amount sum
1 1 1
1 2 3
1 3 6
2 4 4
2 5 9
3 6 6
1 7 7
1 8 15
4 9 9
1 10 10

Related

First 7 days sales

I want to check the sum of the amount for an item from its first day of sale next 7 days. Basically, I want to check the sum of sales for the first 7 days.
I am using the below query.
select item, sum(amt)
from table
where first_sale_dt = (first_sale_dt + 6).
When I run this query, I don't get any results.
Your code as it stands will give you no results, because you are looking at each row, and asking is the value first_sale_dt equal to a values it is not +6
You need to use a WINDOW function to look across many rows, OR self JOIN the table and filter the rows that are joined to give the result you want.
so with the CTE of data for testing:
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 4, '2022-03-04'::date),
(1, 200,'2022-04-01'::date),
(3, 20, '2022-03-01'::date)
t(item, amt, first_sale_dt)
)
this SQL show the filtered row that we are wanting to SUM, it is using a sub-select (which could be moved into a CTE) to find the "first first sale" to do the date range of.
select a.item, b.amt
from (
select
item,
min(first_sale_dt) as first_first_sale_dt
from data
group by 1
) as a
join data as b
on a.item = b.item and b.first_sale_dt <= (a.first_first_sale_dt + 6)
ITEM
AMT
1
2
1
4
3
20
and therefore with a SUM added:
select a.item, sum(b.amt)
from (
select
item,
min(first_sale_dt) as first_first_sale_dt
from data
group by 1
) as a
join data as b
on a.item = b.item and b.first_sale_dt <= (a.first_first_sale_dt + 6)
group by 1;
you get:
ITEM
SUM(B.AMT)
1
6
3
20
Sliding Window:
This is relying on dense data (1 row for every day), also the sliding WINDOW is doing work that is getting thrown away, which is a string sign this is not the performant solution and I would stick to the first solution.
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 2, '2022-03-02'::date),
(1, 2, '2022-03-03'::date),
(1, 2, '2022-03-04'::date),
(1, 2, '2022-03-05'::date),
(1, 2, '2022-03-06'::date),
(1, 2, '2022-03-07'::date),
(1, 2, '2022-03-08'::date)
t(item, amt, first_sale_dt)
)
select item,
first_sale_dt,
sum(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as s
,count(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as c
from data
order by 2;
ITEM
FIRST_SALE_DT
S
C
1
2022-03-01
14
7
1
2022-03-02
14
7
1
2022-03-03
12
6
1
2022-03-04
10
5
1
2022-03-05
8
4
1
2022-03-06
6
3
1
2022-03-07
4
2
1
2022-03-08
2
1
thus you need to then filter out some rows.
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 2, '2022-03-02'::date),
(1, 2, '2022-03-03'::date),
(1, 2, '2022-03-04'::date),
(1, 2, '2022-03-05'::date),
(1, 2, '2022-03-06'::date),
(1, 2, '2022-03-07'::date),
(1, 2, '2022-03-08'::date)
t(item, amt, first_sale_dt)
)
select item,
sum(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as s
from data
qualify row_number() over (partition by item order by first_sale_dt) = 1
gives:
ITEM
S
1
14
If you really want to use window function. Here is beginner friendly version
with cte as
(select *, min(sale_date) over (partition by item) as sale_start_date
from data) --thanks Simeon
select item, sum(amt) as amount
from cte
where sale_date <= sale_start_date + 6 --limit to first week
group by item;
On a side note, I suggest using dateadd instead of + on dates

T-SQL Select all combinations of ranges that meet aggregate criteria

Problem restated per comments
Say we have the following integer id's and counts...
id count
1 0
2 10
3 0
4 0
5 0
6 1
7 9
8 0
We also have a variable #id_range int.
Given a value for #id_range, how can we select all combinations of id ranges, without using while loops or cursors, that meet the following criteria?
1) No two ranges in a combination can overlap (min and max of each range are inclusive)
2) sum(count) for a combination of ranges must equal sum(count) of the initial data set (20 in this case)
3) Only include ranges where sum(count) > 0
The simplest case would be when #id_range = max(id) - min(id), or 7 given the above data. In this case, there's only one solution:
minId maxId count
---------------------
1 8 20
But if #id_range = 1 for example, there would be 4 possible solutions:
Solution 1:
minId maxId count
---------------------
1 2 10
5 6 1
7 8 9
Solution 2:
minId maxId count
---------------------
1 2 10
6 7 10
Solution 3:
minId maxId count
---------------------
2 3 10
5 6 1
7 8 9
Solution 4:
minId maxId count
---------------------
2 3 10
6 7 10
The end goal is to identify which solutions have the fewest number of ranges (solution # 2 and 4, in above example where #id_range = 1).
this solution does not list all possible combination but just try to get group it in smallest possible no of rows.
Hopefully it will cover all possible scenario
-- create the sample table
declare #sample table
(
id int,
[count] int
)
-- insert some sample data
insert into #sample select 1, 0
insert into #sample select 2, 10
insert into #sample select 3, 0
insert into #sample select 4, 0
insert into #sample select 5, 0
insert into #sample select 6, 1
insert into #sample select 7, 9
insert into #sample select 8, 0
-- the #id_range
declare #id_range int = 1
-- the query
; with
cte as
(
-- this cte identified those rows with count > 0 and group them together
-- sign(0) gives 0, sign(+value) gives 1
-- basically it is same as case when [count] > 0 then 1 else 0 end
select *,
grp = row_number() over (order by id)
- dense_rank() over(order by sign([count]), id)
from #sample
),
cte2 as
(
-- for each grp in cte, assign a sub group (grp2). each sub group
-- contains #id_range number of rows
select *,
grp2 = (row_number() over (partition by grp order by id) - 1)
/ (#id_range + 1)
from cte
where count > 0
)
select MinId = min(id),
MaxId = min(id) + #id_range,
[count] = sum(count)
from cte2
group by grp, grp2

How to COUNT rows according to specific complicated rules?

I have the following table:
custid custname channelid channel dateViewed
--------------------------------------------------------------
1 A 1 ABSS 2016-01-09
2 B 2 STHHG 2016-01-19
3 C 4 XGGTS 2016-01-09
6 D 4 XGGTS 2016-01-09
2 B 2 STHHG 2016-01-26
2 B 2 STHHG 2016-01-28
1 A 3 SSJ 2016-01-28
1 A 1 ABSS 2016-01-28
2 B 2 STHHG 2016-02-02
2 B 7 UUJKS 2016-02-10
2 B 8 AKKDC 2016-02-10
2 B 9 GGSK 2016-02-10
2 B 9 GGSK 2016-02-11
2 B 7 UUJKS 2016-02-27
And I want the results to be:
custid custname month count
------------------------------
1 A 1 1
2 B 1 1
2 B 2 4
3 C 1 1
6 D 1 1
According to the following rules:
All channel views subscription is billed every 15 days. If the
customer viewed the same channel within the 15 days, he will only be
billed once for that channel. For instance, custid 2, custname B his billing cycle is 19 Jan - 3 Feb (one billing cycle), 4 Feb - 20 Feb (one billing cycle) and so on. Therefore, he is billed only 1 time in Jan since he watch the same channel throughout the billing cycle; and he is billed 4 times in Feb for watching (channelid 7, 8, 9) and channelid 7 watched on 27 Feb (since this falls in another billing cycle, customer B is also charged here). Customer B is not charged on 2 Feb for watching channel 2 since he was already billed in 19 jan - 3 Feb billing cycle.
An invoice is generated every month for each customer, therefore, the
results should show the 'Month' and the 'Count' of the channels
viewed for each customer.
Can this be done in SQL server?
;WITH cte AS (
SELECT custid,
custname,
channelid,
channel,
dateViewed,
CAST(DATEADD(day,15,dateViewed) as date) as dateEnd,
ROW_NUMBER() OVER (PARTITION BY custid, channelid ORDER BY dateViewed) AS rn
FROM (VALUES
(1, 'A', 1, 'ABSS', '2016-01-09'),(2, 'B', 2, 'STHHG', '2016-01-19'),
(3, 'C', 4, 'XGGTS', '2016-01-09'),(6, 'D', 4, 'XGGTS', '2016-01-09'),
(2, 'B', 2, 'STHHG', '2016-01-26'),(2, 'B', 2, 'STHHG', '2016-01-28'),
(1, 'A', 3, 'SSJ', '2016-01-28'),(1, 'A', 1, 'ABSS', '2016-01-28'),
(2, 'B', 2, 'STHHG', '2016-02-02'),(2, 'B', 7, 'UUJKS', '2016-02-10'),
(2, 'B', 8, 'AKKDC', '2016-02-10'),(2, 'B', 9, 'GGSK', '2016-02-10'),
(2, 'B', 9, 'GGSK', '2016-02-11'),(2, 'B', 7, 'UUJKS', '2016-02-27')
) as t(custid, custname, channelid, channel, dateViewed)
), res AS (
SELECT custid, channelid, dateViewed, dateEnd, 1 as Lev
FROM cte
WHERE rn = 1
UNION ALL
SELECT c.custid, c.channelid, c.dateViewed, c.dateEnd, lev + 1
FROM res r
INNER JOIN cte c ON c.dateViewed > r.dateEnd and c.custid = r.custid and c.channelid = r.channelid
), final AS (
SELECT * ,
ROW_NUMBER() OVER (PARTITION BY custid, channelid, lev ORDER BY dateViewed) rn,
DENSE_RANK() OVER (ORDER BY custid, channelid, dateEnd) dr
FROM res
)
SELECT b.custid,
b.custname,
MONTH(f.dateViewed) as [month],
COUNT(distinct dr) as [count]
FROM cte b
LEFT JOIN final f
ON b.channelid = f.channelid and b.custid = f.custid and b.dateViewed between f.dateViewed and f.dateEnd
WHERE f.rn = 1
GROUP BY b.custid,
b.custname,
MONTH(f.dateViewed)
Output:
custid custname month count
----------- -------- ----------- -----------
1 A 1 3
2 B 1 1
2 B 2 4
3 C 1 1
6 D 1 1
(5 row(s) affected)
I don't know why you get 1 in count field for customer A. He got:
ABSS 2016-01-09 +1 to count (+15 days = 2016-01-24)
SSJ 2016-01-28 +1 to count
ABSS 2016-01-28 +1 to count (28-01 > 24.01)
So in January there must be count = 3.
Whenever I am trying to count things with complex criteria, I use a sum and case statement. Something like below:
SELECT custid, custname,
SUM(CASE WHEN somecriteria
THEN 1
ELSE 0
END) As CriteriaCount
FROM whateverTable
GROUP BY custid, custname
You can make that somecriteria variable as complicated a statement as you like, so long as it returns a boolean. If it passes, this row returns a 1. If it fails, the row reutrns a 0, then we sum up the values returned to get the count.
Generally this is how you can get any number (10 in this example) of fixed 15 day intervals starting at the given date (#dd in this example).
DECLARE #dd date = CAST('2016-01-19 17:30' AS DATE);
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 rows max
tally(N) AS (SELECT TOP (10) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4)
SELECT
startd = DATEADD(D,(N-1)*15, #dd),
endd = DATEADD(D, N*15-1, #dd)
FROM tally
Adapt it to the rules defining how start date must be calculated for the user (and probably chanel).
#Sturgus what if I want to define it in the code? Any other
alternatives besides defining it in the table? How to write a query
that can be run every month to generate the monthly invoice. –
saturday 15 mins ago
Well, one way or another, you will have to save each customer's billing start date (minimally). If you want to do this entirely in SQL without 'editing the database', something like the following should work. The drawback to this approach is that you would need to manually edit the "INSERT INTO" statement every month to suit your needs. If you were allowed to edit the already existing customers table or create a new one, then it would reduce this manual effort.
DECLARE #CustomerBillingPeriodsTVP AS Table(
custID int UNIQUE,
BillingCycleID int,
BillingStartDate Date,
BillingEndDate Date
);
INSERT INTO #CustomerBillingPeriodsTVP (custID, BillingCycleID, BillingStartDate, BillingEndDate) VALUES
(1, 1, '2016-01-03', '2016-01-18'), (2, 1, '2016-01-18', '2016-02-03'), (3, 1, '2016-01-15', '2016-01-30'), (6, 1, '2016-01-14', '2016-01-29');
SELECT A.custid, A.custname, B.BillingCycleID AS [month], COUNT(DISTINCT A.channelid) AS [count]
FROM dbo.tblCustomerChannelViews AS A INNER JOIN #CustomerBillingPeriodsTVP AS B ON A.custid = B.CustID
GROUP BY A.custid, A.custname, B.BillingCycleID;
GO
Where are you getting your customers' billing start dates as it is?
I'm not sure how this solution will scale - but with some good index candidates and decent data housekeeping, it'll work..
You're going to need some extra info for starters, and to normalize your data. You will need to know the first charging period start date for each customer. So store that in a customer table.
Here are the tables I used:
create table #channelViews
(
custId int, channelId int, viewDate datetime
)
create table #channel
(
channelId int, channelName varchar(max)
)
create table #customer
(
custId int, custname varchar(max), chargingStartDate datetime
)
I'll populate some data. I won't get the same results as your sample output, because I don't have the appropriate start dates for each customer. Customer 2 will be OK though.
insert into #channel (channelId, channelName)
select 1, 'ABSS'
union select 2, 'STHHG'
union select 4, 'XGGTS'
union select 3, 'SSJ'
union select 7, 'UUJKS'
union select 8, 'AKKDC'
union select 9, 'GGSK'
insert into #customer (custId, custname, chargingStartDate)
select 1, 'A', '4 Jan 2016'
union select 2, 'B', '19 Jan 2016'
union select 3, 'C', '5 Jan 2016'
union select 6, 'D', '5 Jan 2016'
insert into #channelViews (custId, channelId, viewDate)
select 1,1,'2016-01-09'
union select 2,2,'2016-01-19'
union select 3,4,'2016-01-09'
union select 6,4,'2016-01-09'
union select 2,2,'2016-01-26'
union select 2,2,'2016-01-28'
union select 1,3,'2016-01-28'
union select 1,1,'2016-01-28'
union select 2,2,'2016-02-02'
union select 2,7,'2016-02-10'
union select 2,8,'2016-02-10'
union select 2,9,'2016-02-10'
union select 2,9,'2016-02-11'
union select 2,7,'2016-02-27'
And here is the somewhat unweildy query, in a single statement.
The two underlying sub-queries are actually the same data, so there may be more appropriate / efficient ways to generate these.
We need to exclude from billing any channel charged in the same charging period C for the previous Month. This is the essence of the join. I used a right-join so that I could exclude all such matches from the results (using old.custId is null).
select c.custId, c.[custname], [month], count(*) [count] from
(
select new.custId, new.channelId, new.month, new.chargingPeriod
from
(
select distinct cv.custId, cv.channelId, month(viewdate) [month], (convert(int, cv.viewDate) - convert(int, c.chargingStartDate))/15 chargingPeriod
from #channelViews cv join #customer c on cv.custId = c.custId
) old
right join
(
select distinct cv.custId, cv.channelId, month(viewdate) [month], (convert(int, cv.viewDate) - convert(int, c.chargingStartDate))/15 chargingPeriod
from #channelViews cv join #customer c on cv.custId = c.custId
) new
on old.custId = new.custId
and old.channelId = new.channelId
and old.month = new.Month -1
and old.chargingPeriod = new.chargingPeriod
where old.custId is null
group by new.custId, new.month, new.chargingPeriod, new.channelId
) filteredResults
join #customer c on c.custId = filteredResults.custId
group by c.custId, [month], c.custname
order by c.custId, [month], c.custname
And finally my results:
custId custname month count
1 A 1 3
2 B 1 1
2 B 2 4
3 C 1 1
6 D 1 1
This query does the same thing:
select c.custId, c.custname, [month], count(*) from
(
select cv.custId, min(month(viewdate)) [month], cv.channelId
from #channelViews cv join #customer c on cv.custId = c.custId
group by cv.custId, cv.channelId, (convert(int, cv.viewDate) - convert(int, c.chargingStartDate))/15
) x
join #customer c
on c.custId = x.custId
group by c.custId, c.custname, x.[month]
order by custId, [month]

SQL grouping interescting/overlapping rows

I have the following table in Postgres that has overlapping data in the two columns a_sno and b_sno.
create table data
( a_sno integer not null,
b_sno integer not null,
PRIMARY KEY (a_sno,b_sno)
);
insert into data (a_sno,b_sno) values
( 4, 5 )
, ( 5, 4 )
, ( 5, 6 )
, ( 6, 5 )
, ( 6, 7 )
, ( 7, 6 )
, ( 9, 10)
, ( 9, 13)
, (10, 9 )
, (13, 9 )
, (10, 13)
, (13, 10)
, (10, 14)
, (14, 10)
, (13, 14)
, (14, 13)
, (11, 15)
, (15, 11);
As you can see from the first 6 rows data values 4,5,6 and 7 in the two columns intersects/overlaps that need to partitioned to a group. Same goes for rows 7-16 and rows 17-18 which will be labeled as group 2 and 3 respectively.
The resulting output should look like this:
group | value
------+------
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
Assuming that all pairs exists in their mirrored combination as well (4,5) and (5,4). But the following solutions work without mirrored dupes just as well.
Simple case
All connections can be lined up in a single ascending sequence and complications like I added in the fiddle are not possible, we can use this solution without duplicates in the rCTE:
I start by getting minimum a_sno per group, with the minimum associated b_sno:
SELECT row_number() OVER (ORDER BY a_sno) AS grp
, a_sno, min(b_sno) AS b_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno;
This only needs a single query level since a window function can be built on an aggregate:
Get the distinct sum of a joined table column
Result:
grp a_sno b_sno
1 4 5
2 9 10
3 11 15
I avoid branches and duplicated (multiplicated) rows - potentially much more expensive with long chains. I use ORDER BY b_sno LIMIT 1 in a correlated subquery to make this fly in a recursive CTE.
Create a unique index on a non-unique column
Key to performance is a matching index, which is already present provided by the PK constraint PRIMARY KEY (a_sno,b_sno): not the other way round (b_sno, a_sno):
Is a composite index also good for queries on the first field?
WITH RECURSIVE t AS (
SELECT row_number() OVER (ORDER BY d.a_sno) AS grp
, a_sno, min(b_sno) AS b_sno -- the smallest one
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp
, (SELECT b_sno -- correlated subquery
FROM data
WHERE a_sno = c.sno
AND a_sno < b_sno
ORDER BY b_sno
LIMIT 1)
FROM cte c
WHERE c.sno IS NOT NULL
)
SELECT * FROM cte
WHERE sno IS NOT NULL -- eliminate row with NULL
UNION ALL -- no duplicates
SELECT grp, a_sno FROM t
ORDER BY grp, sno;
Less simple case
All nodes can be reached in ascending order with one or more branches from the root (smallest sno).
This time, get all greater sno and de-duplicate nodes that may be visited multiple times with UNION at the end:
WITH RECURSIVE t AS (
SELECT rank() OVER (ORDER BY d.a_sno) AS grp
, a_sno, b_sno -- get all rows for smallest a_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp, d.b_sno
FROM cte c
JOIN data d ON d.a_sno = c.sno
AND d.a_sno < d.b_sno -- join to all connected rows
)
SELECT grp, sno FROM cte
UNION -- eliminate duplicates
SELECT grp, a_sno FROM t -- add first rows
ORDER BY grp, sno;
Unlike the first solution, we don't get a last row with NULL here (caused by the correlated subquery).
Both should perform very well - especially with long chains / many branches. Result as desired:
SQL Fiddle (with added rows to demonstrate difficulty).
Undirected graph
If there are local minima that cannot be reached from the root with ascending traversal, the above solutions won't work. Consider Farhęg's solution in this case.
I want to say another way, it may be useful, you can do it in 2 steps:
1. take the max(sno) per each group:
select q.sno,
row_number() over(order by q.sno) gn
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
result:
sno gn
7 1
14 2
15 3
2. use a recursive cte to find all related members in groups:
with recursive cte(sno,gn,path,cycle)as(
select q.sno,
row_number() over(order by q.sno) gn,
array[q.sno],false
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
union all
select d.a_sno,c.gn,
d.a_sno || c.path,
d.a_sno=any(c.path)
from data d
join cte c on d.b_sno=c.sno
where not cycle
)
select distinct gn,sno from cte
order by gn,sno
Result:
gn sno
1 4
1 5
1 6
1 7
2 9
2 10
2 13
2 14
3 11
3 15
here is the demo of what I did.
Here is a start that may give some ideas on an approach. The recursive query starts with a_sno of each record and then tries to follow the path of b_sno until it reaches the end or forms a cycle. The path is represented by an array of sno integers.
The unnest function will break the array into rows, so a sno value mapped to the path array such as:
4, {6, 5, 4}
will be transformed to a row for each value in the array:
4, 6
4, 5
4, 4
The array_agg then reverses the operation by aggregating the values back into a path, but getting rid of the duplicates and ordering.
Now each a_sno is associated with a path and the path forms the grouping. dense_rank can be used to map the grouping (cluster) to a numeric.
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
Output:
cluster | sno
--------------+-----
{4,5,6,7} | 4
{4,5,6,7} | 5
{4,5,6,7} | 6
{4,5,6,7} | 7
{9,10,13,14} | 9
{9,10,13,14} | 10
{9,10,13,14} | 13
{9,10,13,14} | 14
{11,15} | 11
{11,15} | 15
(10 rows)
Wrap it one more time for the ranking:
SELECT dense_rank() OVER(order by cluster) AS rank
,sno
FROM (
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
) z
Output:
rank | sno
------+-----
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
(10 rows)

Soccer league table standings with SQL

Perhaps a familiar table for many people. A soccer league table.
But, in this list there is one mistake, rank 4 and 5, are totally equal, so these teams should not be ranked 4 and 5, but 4 and 4, and then the ranking should continue with 6.
Ranking | Team | Points | Goals difference | Goals scored | Goals against
1 A 3 4 4 0
2 B 3 3 3 0
3 C 3 1 2 1
4 D 3 1 1 0
5 E 3 1 1 0
6 F 1 0 2 2
7 G 1 0 0 0
I have been trying to improve the MS SQL query that produces this table, by using a Common Table Expression and SELECT ROW_Number, but that never gives me the right result. Does anyone have a better idea?
You can do this easy by using the RANK() function.
declare #table as table
(
Team varchar(1),
Points int,
GoalsScored int,
GoalsAgainst int
)
insert into #table values ('A', 3, 4, 0),
('B', 3, 3, 0),
('C', 3, 2, 1),
('D', 3, 1, 0),
('E', 3, 1, 0),
('F', 1, 2, 2),
('G', 1, 0, 0)
select RANK() OVER (ORDER BY points desc, GoalsScored - GoalsAgainst desc, GoalsScored desc) AS Rank
,team
,points
,GoalsScored - GoalsAgainst as GoalsDifference
,GoalsScored
,GoalsAgainst
from #table
order by rank
Here is a possible solution. I'm not sure specifically how you are ranking so I've ranked based on Points DESC, Goals Diff DESC, Goals Scored DESC and Goals Against ASC.
;WITH
src AS (
SELECT Team, Points, GoalsDiff, GoalsScor, GoalsAga
FROM dbo.[stats]
)
,src2 AS (
SELECT Points, GoalsDiff, GoalsScor, GoalsAga
FROM src
GROUP BY Points, GoalsDiff, GoalsScor, GoalsAga
)
,src3 AS (
SELECT ROW_NUMBER() OVER (ORDER BY Points DESC, GoalsDiff DESC, GoalsScor DESC, GoalsAga) AS Ranking
,Points, GoalsDiff, GoalsScor, GoalsAga
FROM src2
)
SELECT src3.Ranking, src.Team, src.Points, src.GoalsDiff, src.GoalsScor, src.GoalsAga
FROM src
INNER JOIN src3
ON src.Points = src3.Points
AND src.GoalsDiff = src3.GoalsDiff
AND src.GoalsScor = src3.GoalsScor
AND src.GoalsAga = src3.GoalsAga
The basic approach I used is to select just the stats themselves then group them all. Once grouped then you can rank them and then join the grouped stats with ranking back to the original data to get your rank numbers against the teams. One way to think of it is that you are ranking the stats not the teams.
Hope this helps.