Soccer league table standings with SQL - sql

Perhaps a familiar table for many people. A soccer league table.
But, in this list there is one mistake, rank 4 and 5, are totally equal, so these teams should not be ranked 4 and 5, but 4 and 4, and then the ranking should continue with 6.
Ranking | Team | Points | Goals difference | Goals scored | Goals against
1 A 3 4 4 0
2 B 3 3 3 0
3 C 3 1 2 1
4 D 3 1 1 0
5 E 3 1 1 0
6 F 1 0 2 2
7 G 1 0 0 0
I have been trying to improve the MS SQL query that produces this table, by using a Common Table Expression and SELECT ROW_Number, but that never gives me the right result. Does anyone have a better idea?

You can do this easy by using the RANK() function.
declare #table as table
(
Team varchar(1),
Points int,
GoalsScored int,
GoalsAgainst int
)
insert into #table values ('A', 3, 4, 0),
('B', 3, 3, 0),
('C', 3, 2, 1),
('D', 3, 1, 0),
('E', 3, 1, 0),
('F', 1, 2, 2),
('G', 1, 0, 0)
select RANK() OVER (ORDER BY points desc, GoalsScored - GoalsAgainst desc, GoalsScored desc) AS Rank
,team
,points
,GoalsScored - GoalsAgainst as GoalsDifference
,GoalsScored
,GoalsAgainst
from #table
order by rank

Here is a possible solution. I'm not sure specifically how you are ranking so I've ranked based on Points DESC, Goals Diff DESC, Goals Scored DESC and Goals Against ASC.
;WITH
src AS (
SELECT Team, Points, GoalsDiff, GoalsScor, GoalsAga
FROM dbo.[stats]
)
,src2 AS (
SELECT Points, GoalsDiff, GoalsScor, GoalsAga
FROM src
GROUP BY Points, GoalsDiff, GoalsScor, GoalsAga
)
,src3 AS (
SELECT ROW_NUMBER() OVER (ORDER BY Points DESC, GoalsDiff DESC, GoalsScor DESC, GoalsAga) AS Ranking
,Points, GoalsDiff, GoalsScor, GoalsAga
FROM src2
)
SELECT src3.Ranking, src.Team, src.Points, src.GoalsDiff, src.GoalsScor, src.GoalsAga
FROM src
INNER JOIN src3
ON src.Points = src3.Points
AND src.GoalsDiff = src3.GoalsDiff
AND src.GoalsScor = src3.GoalsScor
AND src.GoalsAga = src3.GoalsAga
The basic approach I used is to select just the stats themselves then group them all. Once grouped then you can rank them and then join the grouped stats with ranking back to the original data to get your rank numbers against the teams. One way to think of it is that you are ranking the stats not the teams.
Hope this helps.

Related

First 7 days sales

I want to check the sum of the amount for an item from its first day of sale next 7 days. Basically, I want to check the sum of sales for the first 7 days.
I am using the below query.
select item, sum(amt)
from table
where first_sale_dt = (first_sale_dt + 6).
When I run this query, I don't get any results.
Your code as it stands will give you no results, because you are looking at each row, and asking is the value first_sale_dt equal to a values it is not +6
You need to use a WINDOW function to look across many rows, OR self JOIN the table and filter the rows that are joined to give the result you want.
so with the CTE of data for testing:
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 4, '2022-03-04'::date),
(1, 200,'2022-04-01'::date),
(3, 20, '2022-03-01'::date)
t(item, amt, first_sale_dt)
)
this SQL show the filtered row that we are wanting to SUM, it is using a sub-select (which could be moved into a CTE) to find the "first first sale" to do the date range of.
select a.item, b.amt
from (
select
item,
min(first_sale_dt) as first_first_sale_dt
from data
group by 1
) as a
join data as b
on a.item = b.item and b.first_sale_dt <= (a.first_first_sale_dt + 6)
ITEM
AMT
1
2
1
4
3
20
and therefore with a SUM added:
select a.item, sum(b.amt)
from (
select
item,
min(first_sale_dt) as first_first_sale_dt
from data
group by 1
) as a
join data as b
on a.item = b.item and b.first_sale_dt <= (a.first_first_sale_dt + 6)
group by 1;
you get:
ITEM
SUM(B.AMT)
1
6
3
20
Sliding Window:
This is relying on dense data (1 row for every day), also the sliding WINDOW is doing work that is getting thrown away, which is a string sign this is not the performant solution and I would stick to the first solution.
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 2, '2022-03-02'::date),
(1, 2, '2022-03-03'::date),
(1, 2, '2022-03-04'::date),
(1, 2, '2022-03-05'::date),
(1, 2, '2022-03-06'::date),
(1, 2, '2022-03-07'::date),
(1, 2, '2022-03-08'::date)
t(item, amt, first_sale_dt)
)
select item,
first_sale_dt,
sum(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as s
,count(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as c
from data
order by 2;
ITEM
FIRST_SALE_DT
S
C
1
2022-03-01
14
7
1
2022-03-02
14
7
1
2022-03-03
12
6
1
2022-03-04
10
5
1
2022-03-05
8
4
1
2022-03-06
6
3
1
2022-03-07
4
2
1
2022-03-08
2
1
thus you need to then filter out some rows.
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 2, '2022-03-02'::date),
(1, 2, '2022-03-03'::date),
(1, 2, '2022-03-04'::date),
(1, 2, '2022-03-05'::date),
(1, 2, '2022-03-06'::date),
(1, 2, '2022-03-07'::date),
(1, 2, '2022-03-08'::date)
t(item, amt, first_sale_dt)
)
select item,
sum(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as s
from data
qualify row_number() over (partition by item order by first_sale_dt) = 1
gives:
ITEM
S
1
14
If you really want to use window function. Here is beginner friendly version
with cte as
(select *, min(sale_date) over (partition by item) as sale_start_date
from data) --thanks Simeon
select item, sum(amt) as amount
from cte
where sale_date <= sale_start_date + 6 --limit to first week
group by item;
On a side note, I suggest using dateadd instead of + on dates

Find Comebacks in Football Database With Sql Query

I have 4 tables.
Matches:
| id | HomeTeamID | AwayTeamID |
--------|-------------|------------
| 1 | 1 | 2
| 2 | 1 | 3
| 3 | 3 | 1
Goals:
| id | MatchID | Minute | TeamID
--------|-------------|---------- |---------
| 1 | 1 | 3 | 2
| 2 | 1 | 5 | 1
| 3 | 1 | 15 | 1
| 4 | 2 | 43 | 3
| 5 | 2 | 75 | 1
| 6 | 2 | 85 | 1
| 7 | 3 | 11 | 1
| 8 | 3 | 13 | 3
| 9 | 3 | 77 | 3
Teams:
| id | Name |
--------|-------------|
| 1 | Chelsea |
| 2 | Arsenal |
| 3 | Tottenham |
Managers:
| id | Name | TeamID |
--------|-------------|-------------
| 1 | Conte | 1
| 2 | Wenger | 2
| 3 | Pochettino | 3
I want to get the number of comebacks matches for managers. For example, Conte's team conceded the first goal in match 1 and 2 but they won. So Conte has 2 comebacks. Pochettino has 1 comeback at match 3. I want to find that with SQL Query.
I found the first goal of matches for each team. But after some steps, I'm losing what I'm doing.
SELECT MatchID, MIN(minute), g.TeamID
FROM Goals g
JOIN Managers m ON m.TeamID = g.TeamID
GROUP BY MatchID, g.TeamID
with cte
(
MatchID,TeamID,TotalGoalTime,NoOfGoals,ManagerName,comeback)
as(SELECT MatchID, g.TeamID,sum(minutea) as'TotalGoalTime' ,count(*)as'NoOfGoals',m.name as'ManagerName'
,comeback =ROW_NUMBER() OVER(PARTITION BY MatchID order by sum(minutea) desc)
FROM Goals g
JOIN Managers m ON m.TeamID = g.TeamID
join [Teams] t on t.Id=g.TeamId
GROUP BY MatchID, g.TeamID,m.name )
Select MatchID,TeamID,NoOfGoals,ManagerName from cte where comeback =1
The above query for now gives us the overall comeback, Will update the no of come backs.
In case you want to count every comeback in football match, you can use the solution below. Then the definition of the comeback is every time when a team score one goal more the opponent, after loosing. For example, for the following scenario we have three comebacks:
Team A Team B
0 - 1 //team b scores
1 - 1 //team a scores
2 - 1 //team a scores (comeback for a)
2 - 2 //team b scores
2 - 3 //team b scores (comeback for b)
3 - 3 //team a scores
4 - 3 //team a scores (comeback for a)
From the above it seems that we have a comeback, when score is changed, and the previous score was even. I am using SUM with OVER and ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ordering by minute in order to calculated the score each time a goal is scored.
Here is full working example:
DECLARE #matches TABLE
(
[id] TINYINT
,[HomeTeamID] TINYINT
,[AwayTeamID] TINYINT
);
DECLARE #Goals TABLE
(
[id] TINYINT
,[MatchID] TINYINT
,[Minute] TINYINT
,[TeamID] TINYINT
);
DECLARE #Teams TABLE
(
[id] TINYINT
,[Name] VARCHAR(12)
);
DECLARE #Managers TABLE
(
[Id] TINYINT
,[Name] VARCHAR(12)
,[TeamID] TINYINT
);
INSERT INTO #matches ([id], [HomeTeamID], [AwayTeamID])
VALUES (1, 1, 2)
,(2, 1, 3)
,(3, 3, 1)
,(4, 1, 4);
INSERT INTO #Goals ([id], [MatchID], [Minute], [TeamID])
VALUES (1, 1, 3, 2)
,(2, 1, 5, 1)
,(3, 1, 15, 1)
,(4, 2, 43, 3)
,(5, 2, 75, 1)
,(6, 2, 85, 1)
,(7, 3, 11, 1)
,(8, 3, 13, 3)
,(9, 3, 77, 3)
,(10, 4, 3, 1)
,(11, 4, 5, 4)
,(12, 4, 10, 4)
,(13, 4, 12, 1)
,(14, 4, 25, 1)
,(15, 4, 46, 4)
,(16, 4, 60, 4)
,(17, 4, 72, 4)
,(18, 4, 84, 4);
INSERT INTO #Teams ([id], [Name])
VALUES (1, 'Chelsea')
,(2, 'Arsenal')
,(3, 'Tottenham')
,(4, 'Real Madrid');
INSERT INTO #Managers ([Id], [Name], [TeamID])
VALUES (1, 'Conte', 1)
,(2, 'Wenger', 2)
,(3, 'Pochettino', 3)
,(4, 'Zidane', 4);
WITH DataSource AS
(
SELECT m.[id]
,m.[HomeTeamID]
,m.[AwayTeamID]
,ROW_NUMBER() OVER (PARTITION BY m.[id] ORDER BY g.[minute]) AS [EventID]
,IIF
(
SUM(IIF(m.[HomeTeamID] = g.[teamID], 1, 0)) OVER (PARTITION BY m.[id] ORDER BY g.[minute] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - 1
=
SUM(IIF(m.[AwayTeamID] = g.[teamID], 1, 0)) OVER (PARTITION BY m.[id] ORDER BY g.[minute] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
OR
SUM(IIF(m.[HomeTeamID] = g.[teamID], 1, 0)) OVER (PARTITION BY m.[id] ORDER BY g.[minute] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
=
SUM(IIF(m.[AwayTeamID] = g.[teamID], 1, 0)) OVER (PARTITION BY m.[id] ORDER BY g.[minute] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1
,IIF(m.[HomeTeamID] = g.[teamID], 'H', 'A') -- (H)ome come back, (A)way come ba
,'N' -- no come back
) AS [ComeBack]
FROM #matches m
INNER JOIN #Goals g
ON m.[id] = g.[MatchID]
)
SELECT T.[Name]
FROM DataSource DS
INNER JOIN #Teams T
ON IIF([ComeBack] = 'H', [HomeTeamID], [AwayTeamID]) = T.[id]
WHERE DS.[EventID] <> 1
AND DS.[ComeBack] <> 'N';
The above will give us:
Chelsea
Chelsea
Chelsea
Tottenham
Real Madrid
Real Madrid
Note, I have added one more match to demonstrate this.
Here "Comeback" means that team conceded first goal, but team won this game.
I use 2 general subquery, 1) winners which contains MatchID and TeamID from each game which not ended as draw. And 2) first_goals, which contains those TeamID's, which scored first goal in match.
So, joining between these subqueries using:
on winners.MatchID = first_goals.MatchID and winners.TeamID <> first_goals.TeamID
gives us those matchs, where team won, but not scored first goal (i.e "Comeback").
Finally we use simple joins with Teams and Managers tables:
with Goals(id , MatchID , Minute ,TeamID) as (
select 1 , 1 , 3 , 2 union all
select 2 , 1 , 5 , 1 union all
select 3 , 1 , 15 , 1 union all
select 4 , 2 , 43 , 3 union all
select 5 , 2 , 75 , 1 union all
select 6 , 2 , 85 , 1 union all
select 7 , 3 , 11 , 1 union all
select 8 , 3 , 13 , 3 union all
select 9 , 3 , 77 , 3
),
Teams (id, Name) as(
select 1 ,'Chelsea' union all
select 2 ,'Arsenal' union all
select 3 ,'Tottenham'
),
Managers(id, Name, TeamID) as (
select 1 ,'Conte', 1 union all
select 2 ,'Wenger', 2 union all
select 3 ,'Pochettino', 3
)
select winners.TeamID, winners.MatchID, Teams.Name, Managers.Name from (
select t1.* from
(
select TeamID, MatchID, count(*) as goal_scored from Goals
group by TeamID, MatchID
)t1
inner join
(
select MatchID, max(goal_scored) as winner_goals_cnt from (
select TeamID, MatchID, count(*) as goal_scored from Goals
group by TeamID, MatchID
)t
group by MatchID
having min(goal_scored) <> max(goal_scored)
)t2
on t1.MatchID = t2.MatchID and t1.goal_scored = t2.winner_goals_cnt
) winners
inner join
(
select * from (
select Goals.*, row_number() over(partition by MatchID order by Minute, id) rn from Goals
) f
where rn = 1
) first_goals
on winners.MatchID = first_goals.MatchID and winners.TeamID <> first_goals.TeamID
inner join Teams
on winners.TeamID = Teams.id
inner join Managers
on winners.TeamID = Managers.TeamID
I don't follow much of football, but assuming that a comeback is when a team concedes the first goal but then wins the game, I would use the following logic to get the result.
First, we need the teams that had conceded the first goal in a match:
SELECT TeamID, MatchID FROM Goals WHERE GoalID in
(SELECT Min(GoalID) FROM Goals GROUP BY MatchID)
Second, for each match, we need the matches where the team making first goal went on to win.
SELECT MatchID FROM
(SELECT COUNT(GoalID) as TotalGoals, MatchID FROM Goals GROUP BY MatchID) AS MatchSummary
INNER JOIN
(SELECT COUNT(GoalID) as TeamGoals, MatchID FROM
(SELECT TeamID, MatchID FROM Goals WHERE GoalID in
(SELECT Min(GoalID) FROM Goals GROUP BY MatchID)
) as GoalsOfTheTeamThatConcededFirstGoal
GROUP BY MatchID) as SummaryOfTeamThatConcededFirstGoal
ON MatchSummary.MatchID = SummaryOfTeamThatConcededFirstGoal.MatchID
WHERE (TotalGoals - TeamGoals) < TeamGoals
Combining these two queries you will be able to get the TeamIDs for those teams that have made a comeback.
I think this task could be done much better with cursors or temporary tables, but I wanted to keep the answer as simple as possible. Hence I have avoided it. You should definitely explore those two methods.

Running Sum that resets to 0 on each new cluster of consecutives

I have tried and failed to adapt several running sum methods (remember I have to use SQL Server 2008, so it's a bit trickier than in 2012).
The goal is to have a running sum of Amount ordered by Date. Any time Category field changes value during that list, the sum should restart.
Table structure:
[Date], [Category], [Amount]
Example:
[Date], [Category], [Amount], [RunSumReset]
-------------------------------------------
1-Jan, catA, 10, 10
2-Jan, catA, 5, 15
3-Jan, catA, 15, 30
15-Jan, catB, 3, 3
1-Feb, catB, 6, 9
11-Feb, catA, 10, 10
12-Feb, catC, 2, 2
1-Apr, catA, 5, 5
Thanks so much for any slick tips or tricks
Using Version 2008 makes things a bit trickier since the window version of SUM with ORDER BY clause is not available.
One way to do it is:
WITH CTE AS (
SELECT [Date], Category, Amount,
ROW_NUMBER() OVER (ORDER BY [Date]) -
ROW_NUMBER() OVER (PARTITION BY Category
ORDER BY [Date]) AS grp
FROM mytable
)
SELECT [Date], Category, Amount, Amount + COALESCE(t.s, 0) AS RunSumReset
FROM CTE AS c1
OUTER APPLY (
SELECT SUM(c2.Amount)
FROM CTE AS c2
WHERE c2.[Date] < c1.[Date] AND
c1.Category = c2.Category AND
c1.grp = c2.grp) AS t(s)
ORDER BY [Date]
The CTE is used to calculate field grp that identifies islands of consecutive records having the same Category. Once Category changes, grp value also changes. Using this CTE we can calculate the running total the way it is normally done in versions prior to SQL Server 2012, i.e. using OUTER APPLY.
Select sum of amounts in current row and up to first row that has different category. In your case you will need to replace NULL with some min date that SQL Server supports, like '17530101':
DECLARE #t TABLE
(
category INT ,
amount INT ,
ordering INT
)
INSERT INTO #t
VALUES ( 1, 1, 1 ),
( 1, 2, 2 ),
( 1, 3, 3 ),
( 2, 4, 4 ),
( 2, 5, 5 ),
( 3, 6, 6 ),
( 1, 7, 7 ),
( 1, 8, 8 ),
( 4, 9, 9 ),
( 1, 10, 10 )
SELECT category ,
amount ,
( SELECT SUM(amount)
FROM #t
WHERE category = t.category
AND ordering <= t.ordering
AND ordering > ( SELECT ISNULL(MAX(ordering), 0)
FROM #t
WHERE category <> t.category
AND ordering < t.ordering
)
) AS sum
FROM #t t
ORDER BY t.ordering
Output:
category amount sum
1 1 1
1 2 3
1 3 6
2 4 4
2 5 9
3 6 6
1 7 7
1 8 15
4 9 9
1 10 10

SQL grouping interescting/overlapping rows

I have the following table in Postgres that has overlapping data in the two columns a_sno and b_sno.
create table data
( a_sno integer not null,
b_sno integer not null,
PRIMARY KEY (a_sno,b_sno)
);
insert into data (a_sno,b_sno) values
( 4, 5 )
, ( 5, 4 )
, ( 5, 6 )
, ( 6, 5 )
, ( 6, 7 )
, ( 7, 6 )
, ( 9, 10)
, ( 9, 13)
, (10, 9 )
, (13, 9 )
, (10, 13)
, (13, 10)
, (10, 14)
, (14, 10)
, (13, 14)
, (14, 13)
, (11, 15)
, (15, 11);
As you can see from the first 6 rows data values 4,5,6 and 7 in the two columns intersects/overlaps that need to partitioned to a group. Same goes for rows 7-16 and rows 17-18 which will be labeled as group 2 and 3 respectively.
The resulting output should look like this:
group | value
------+------
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
Assuming that all pairs exists in their mirrored combination as well (4,5) and (5,4). But the following solutions work without mirrored dupes just as well.
Simple case
All connections can be lined up in a single ascending sequence and complications like I added in the fiddle are not possible, we can use this solution without duplicates in the rCTE:
I start by getting minimum a_sno per group, with the minimum associated b_sno:
SELECT row_number() OVER (ORDER BY a_sno) AS grp
, a_sno, min(b_sno) AS b_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno;
This only needs a single query level since a window function can be built on an aggregate:
Get the distinct sum of a joined table column
Result:
grp a_sno b_sno
1 4 5
2 9 10
3 11 15
I avoid branches and duplicated (multiplicated) rows - potentially much more expensive with long chains. I use ORDER BY b_sno LIMIT 1 in a correlated subquery to make this fly in a recursive CTE.
Create a unique index on a non-unique column
Key to performance is a matching index, which is already present provided by the PK constraint PRIMARY KEY (a_sno,b_sno): not the other way round (b_sno, a_sno):
Is a composite index also good for queries on the first field?
WITH RECURSIVE t AS (
SELECT row_number() OVER (ORDER BY d.a_sno) AS grp
, a_sno, min(b_sno) AS b_sno -- the smallest one
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
GROUP BY a_sno
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp
, (SELECT b_sno -- correlated subquery
FROM data
WHERE a_sno = c.sno
AND a_sno < b_sno
ORDER BY b_sno
LIMIT 1)
FROM cte c
WHERE c.sno IS NOT NULL
)
SELECT * FROM cte
WHERE sno IS NOT NULL -- eliminate row with NULL
UNION ALL -- no duplicates
SELECT grp, a_sno FROM t
ORDER BY grp, sno;
Less simple case
All nodes can be reached in ascending order with one or more branches from the root (smallest sno).
This time, get all greater sno and de-duplicate nodes that may be visited multiple times with UNION at the end:
WITH RECURSIVE t AS (
SELECT rank() OVER (ORDER BY d.a_sno) AS grp
, a_sno, b_sno -- get all rows for smallest a_sno
FROM data d
WHERE a_sno < b_sno
AND NOT EXISTS (
SELECT 1 FROM data
WHERE b_sno = d.a_sno
AND a_sno < b_sno
)
)
, cte AS (
SELECT grp, b_sno AS sno FROM t
UNION ALL
SELECT c.grp, d.b_sno
FROM cte c
JOIN data d ON d.a_sno = c.sno
AND d.a_sno < d.b_sno -- join to all connected rows
)
SELECT grp, sno FROM cte
UNION -- eliminate duplicates
SELECT grp, a_sno FROM t -- add first rows
ORDER BY grp, sno;
Unlike the first solution, we don't get a last row with NULL here (caused by the correlated subquery).
Both should perform very well - especially with long chains / many branches. Result as desired:
SQL Fiddle (with added rows to demonstrate difficulty).
Undirected graph
If there are local minima that cannot be reached from the root with ascending traversal, the above solutions won't work. Consider Farhęg's solution in this case.
I want to say another way, it may be useful, you can do it in 2 steps:
1. take the max(sno) per each group:
select q.sno,
row_number() over(order by q.sno) gn
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
result:
sno gn
7 1
14 2
15 3
2. use a recursive cte to find all related members in groups:
with recursive cte(sno,gn,path,cycle)as(
select q.sno,
row_number() over(order by q.sno) gn,
array[q.sno],false
from(
select distinct d.a_sno sno
from data d
where not exists (
select b_sno
from data
where b_sno=d.a_sno
and a_sno>d.a_sno
)
)q
union all
select d.a_sno,c.gn,
d.a_sno || c.path,
d.a_sno=any(c.path)
from data d
join cte c on d.b_sno=c.sno
where not cycle
)
select distinct gn,sno from cte
order by gn,sno
Result:
gn sno
1 4
1 5
1 6
1 7
2 9
2 10
2 13
2 14
3 11
3 15
here is the demo of what I did.
Here is a start that may give some ideas on an approach. The recursive query starts with a_sno of each record and then tries to follow the path of b_sno until it reaches the end or forms a cycle. The path is represented by an array of sno integers.
The unnest function will break the array into rows, so a sno value mapped to the path array such as:
4, {6, 5, 4}
will be transformed to a row for each value in the array:
4, 6
4, 5
4, 4
The array_agg then reverses the operation by aggregating the values back into a path, but getting rid of the duplicates and ordering.
Now each a_sno is associated with a path and the path forms the grouping. dense_rank can be used to map the grouping (cluster) to a numeric.
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
Output:
cluster | sno
--------------+-----
{4,5,6,7} | 4
{4,5,6,7} | 5
{4,5,6,7} | 6
{4,5,6,7} | 7
{9,10,13,14} | 9
{9,10,13,14} | 10
{9,10,13,14} | 13
{9,10,13,14} | 14
{11,15} | 11
{11,15} | 15
(10 rows)
Wrap it one more time for the ranking:
SELECT dense_rank() OVER(order by cluster) AS rank
,sno
FROM (
SELECT array_agg(DISTINCT map ORDER BY map) AS cluster
,sno
FROM ( WITH RECURSIVE x(sno, path, cycle) AS (
SELECT a_sno, ARRAY[a_sno], false FROM data
UNION ALL
SELECT b_sno, path || b_sno, b_sno = ANY(path)
FROM data, x
WHERE a_sno = x.sno
AND NOT cycle
)
SELECT sno, unnest(path) AS map FROM x ORDER BY 1
) y
GROUP BY sno
ORDER BY 1, 2
) z
Output:
rank | sno
------+-----
1 | 4
1 | 5
1 | 6
1 | 7
2 | 9
2 | 10
2 | 13
2 | 14
3 | 11
3 | 15
(10 rows)

SQL group uniquely by type and by position

Given this dataset:
ID type_id Position
1 2 7
2 1 2
3 3 5
4 1 1
5 3 3
6 2 4
7 2 6
8 3 8
(There are only 3 different possible type_ids) I'd like to return a dataset with one of each type_id in groups, ordered by position.
so it would be grouped like so:
Results (ID): [4, 6, 5], [2, 7, 3], [null, 1, 8]
So the first group would consist of each of the entries type_id's with the highest (Relative) position score, the second group would have the second highest score, the third would only consist of two entries (and a null) because there are not three more of each type_id
Does this make sense? And is it possible?
something like that:
with CTE as (
select
row_number() over (partition by type_id order by Position) as row_num,
*
from test
)
select array_agg(ID order by type_id)
from CTE
group by row_num
SQL FIDDLE
of, if you absolutely need nulls in your arrays:
with CTE as (
select
row_number() over (partition by type_id order by Position) as row_num,
*
from test
)
select array_agg(c.ID order by t.type_id)
from (select distinct row_num from CTE) as a
cross join (select distinct type_id from test) as t
left outer join CTE as c on c.row_num = a.row_num and c.type_id = t.type_id
group by a.row_num
SQL FIDDLE