Limiting recursion to certain level - Duplicate rows - sql

To start this isn't a duplicate of my other question with the same name, I just couldn't think of a better name for this one!
I have a SQL table named Player and another called Unit.
Each Player MUST belong to a team via a foreign key UnitID.
Each Unit can belong to another Team via a recursive field ParentUnitID.
A ParentUnit CAN be a ParentUnit (infinite recursion), but a Player can only belong to one Team.
A Unit may have many children
So it could be (top down)...
TeamA (is the top level)
TeamB (belongs to ^^)
TeamC (belongs to ^^)
TeamD (belongs to ^^)
Player_1 (belongs to ^^)
My question is, if I'm given a Player's PlayerID (the PK for that table), what is the best way to get a specific team?
See my SQLFiddle for the data, structure and a query: http://sqlfiddle.com/#!3/78965/3
With my data (from the fiddle), I want to be able to get every player under "TeamB", but then "Player4"'s top unit should be "TeamC". In order to do that all I want to pass in is the PlayerID and the ID for "TeamB". So I'm saying "get all players and top units under TeamB and then filtering out all Players except Player4.
EDIT: I believe the above paragraph should read: With my data (from the fiddle), I want to be able to establish the top team(s) that plays below "TeamB". For each top team below "TeamB" I then want to establish all players that play for or below that team. I then want the ability to limit the list of players to one or more specific players.
As you can see in SQLFiddle I get back multiple rows for each player, I'm sure it's a quick fix, but I can't figure it out... That's what I'm going for in my fiddle, but it's, well, uh, a bit fiddly... :)
Edit: More Information:
OK, so if imagine this is a stored procedure.
I pass in PlayerIDs 1,2,3,4
I pass in UnitID 2
I would expect the returned data to look something like
Player3, TeamC
Only Player3 is returned as it is the only player which is a descendant of TeamB (ID 2), and TeamC is returned as it is the highest level unit (below UnitID 2) which Player3 belongs to.
If I instead passed in:
I pass in PlayerIDs 1,2,3,4
I pass in UnitID 6
I would expect
Player1, Team2
Player2, Team2

This answer has been completely rewritten. The original did not quite work in all circumstances
I had to change the CTE to represent the full Unit hierarchy for every Unit as a possible root (top unit). It allows a true hierarchy with multiple children per Unit.
I extended the sample data in this SQL Fiddle to have a player assigned to both units 11 and 12. It properly returns the correct row for each of 3 players that play for a Unit at some level beneath Unit 1.
The "root" Unit ID and the list of player IDs is conveniently in the outermost WHERE clause at the bottom, making it easy to change the IDs as needed.
with UnitCTE as (
select u.UnitID,
u.Designation UnitDesignation,
u.ParentUnitID as ParentUnitID,
p.Designation as ParentUnitDesignation,
u.UnitID TopUnitID,
u.Designation TopUnitDesignation,
1 as TeamLevel
from Unit u
left outer join Unit p
on u.ParentUnitId = p.UnitID
union all
select t.UnitID,
t.Designation UnitDesignation,
c.UnitID as ParentUnitID,
c.UnitDesignation as ParentUnitDesignation,
c.TopUnitID,
c.TopUnitDesignation,
TeamLevel+1 as TeamLevel
from Unit t
join UnitCTE c
on t.ParentUnitID = c.UnitID
)
select p.PlayerID,
p.Designation,
t1.*
from UnitCTE t1
join UnitCTE t2
on t2.TopUnitID = t1.UnitID
and t2.TopUnitID = t1.TopUnitID
join Player p
on p.UnitID = t2.UnitID
where t1.ParentUnitID = 1
and playerID in (1,2,3,4,5,6)
Here is a slightly optimized version that has the Unit ID criteria embedded in the CTE. The CTE only computes hierarchies rooted on Units where Parent ID is the chosen Unit ID (1 in this case)
with UnitCTE as (
select u.UnitID,
u.Designation UnitDesignation,
u.ParentUnitID as ParentUnitID,
p.Designation as ParentUnitDesignation,
u.UnitID TopUnitID,
u.Designation TopUnitDesignation,
1 as TeamLevel
from Unit u
left outer join Unit p
on u.ParentUnitId = p.UnitID
where u.ParentUnitID = 1
union all
select t.UnitID,
t.Designation UnitDesignation,
c.UnitID as ParentUnitID,
c.UnitDesignation as ParentUnitDesignation,
c.TopUnitID,
c.TopUnitDesignation,
TeamLevel+1 as TeamLevel
from Unit t
join UnitCTE c
on t.ParentUnitID = c.UnitID
)
select p.PlayerID,
p.Designation,
t1.*
from UnitCTE t1
join UnitCTE t2
on t2.TopUnitID = t1.UnitID
join Player p
on p.UnitID = t2.UnitID
where playerID in (1,2,3,4,5,6)
Here is my original answer. It only works if the Unit Hierarchy is constrained to allow only one child per Unit. The SQL Fiddle example in the question has 3 children for Unit 1, so it falsely returns multiple rows for Players 3, 5 and 6 if run against Unit 1
Here is a SQL Fiddle that demonstrates the problem.
with UnitCTE as
select UnitID,
Designation UnitDesignation,
ParentUnitID as ParentUnitID,
cast(null as varchar(50)) as ParentUnitDesignation,
UnitID TopUnitID,
Designation TopUnitDesignation,
1 as TeamLevel
from Unit
where ParentUnitID is null
union all
select t.UnitID,
t.Designation UnitDesignation,
c.UnitID,
c.UnitDesignation,
c.TopUnitID,
c.TopUnitDesignation,
TeamLevel+1 as TeamLevel
from Unit t
join UnitCTE c
on t.ParentUnitID = c.UnitID
)
select p.PlayerID,
p.Designation,
t2.*
from Player p
join UnitCTE t1
on p.UnitID = t1.UnitID
join UnitCTE t2
on t2.TopUnitID = t1.TopUnitID
and t1.TeamLevel >= t2.TeamLevel
join UnitCTE t3
on t3.TopUnitID = t1.TopUnitID
and t2.TeamLevel = t3.TeamLevel+1
where t3.UnitID = 2
and playerID in (1,2,3,4)

with UnitCTE as (
select UnitID,
Designation,
ParentUnitID as ParentUnitID,
cast(null as varchar(50)) as ParentUnitDesignation,
UnitID TopUnitID,
Designation TopUnitDesignation,
1 as TeamLevel
from Unit
where ParentUnitID is null
union all
select t.UnitID,
t.Designation,
c.UnitID,
c.Designation,
c.TopUnitID,
c.TopUnitDesignation,
TeamLevel+1 as TeamLevel
from Unit t
join UnitCTE c
on t.ParentUnitID = c.UnitID
--WHERE t.UnitID = 1
),
x AS (
select Player.PlayerID,
pDesignation = Player.Designation, t1.*,
rn = ROW_NUMBER() OVER (PARTITION BY Player.PlayerID ORDER BY Player.Designation)
from Player
join UnitCTE t1
on Player.UnitID = t1.UnitID
join UnitCTE t2
on t1.TopUnitID = t2.TopUnitID
and t2.TeamLevel=2
)
SELECT * FROM x
WHERE rn = 1
ORDER BY TeamLevel

Related

Can I use the ALL operator to test that all values of a "group" exist in another query/set? If so, how?

I have this database for a CS/database-theory homework question for a hypothetical movie store company:
For those who might be unfamiliar with the concept, a movie store is a retail location where patrons can film productions on VHS tape, or this newfangled format called "DVD".
Who knows, maybe some time in the distant, far-off, future we might be able to view movies directly over the Internet?
The DDL and sample data is below.
I need to write a query that will show all movies that are available in all three Chicago stores: (WI01, WI02, and WI03).
By looking at the raw data below ourselves, we can see that only these 3 movieId values (D00001, D00006, and D00007) have movie_store rows for every store located in Chicago.
CREATE TABLE movie (
movieId varchar(6) NOT NULL PRIMARY KEY,
title nvarchar(50) NOT NULL
);
CREATE TABLE store (
storeId varchar(4) NOT NULL PRIMARY KEY,
city nvarchar(20) NOT NULL
);
CREATE TABLE movie_store (
movieid varchar(6) FOREIGN KEY REFERENCES movie ( movieId ),
storeid varchar(4) FOREIGN KEY REFERENCES store ( storeId ),
PRIMARY KEY ( movieId, storeId )
);
GO
INSERT INTO movie ( movieId, title )
VALUES
('D00001', N'True Lies'),
('D00002', N'Predator'),
('D00003', N'Last Action Hero'),
('D00004', N'Red Heat'),
('D00005', N'Conan 1'),
('D00006', N'Conan 2'),
('D00007', N'Red Sonja');
INSERT INTO store ( storeId, city ) VALUES
('WI01', N'Chicago'),
('WI02', N'Chicago'),
('WI03', N'Chicago'),
('IL01', N'Atlanta'),
('IL02', N'Nashville');
INSERT INTO movie_store ( movieId, storeId ) VALUES
-- True Lies:
('D00001', 'WI01'),
('D00001', 'WI02'),
('D00001', 'WI03'),
-- 'Predator:
('D00002', 'IL01'),
('D00002', 'IL02'),
-- Last Action Hero:
('D00003', 'WI01'),
-- Red Heat:
('D00004', 'WI01'),
('D00004', 'WI02'),
('D00004', 'IL02'),
-- Conan 1:
('D00005', 'WI01'),
('D00005', 'WI02'),
-- Conan 2:
('D00006', 'WI01'),
('D00006', 'WI02'),
('D00006', 'WI03'),
-- Red Sonja:
('D00007', 'WI01'),
('D00007', 'WI02'),
('D00007', 'WI03');
During my problem-solving research I found a site explaining the ALL operator.
My query is getting unique storeIds for Chicago.
It is then trying to get the movie title with a storeId record for each one of the Chicago locations.
WITH chicagoStores AS (
SELECT DISTINCT
storeId
FROM
store
WHERE
city = 'Chicago'
)
SELECT DISTINCT
m.title
FROM
movie AS m
INNER JOIN movie_store AS y ON m.movieid = y.movieid
INNER JOIN store AS s ON y.storeid = s.storeid
WHERE
s.storeId = ALL( SELECT storeId FROM chicagoStores )
But my query returns zero rows (and no errors), am I misunderstanding the ALL operator?
Try this
select city,title,count(*)
from
store
inner join movie_store
on store.storeid = movie_store.storeid
inner join movie
on movie_store.movieid = movie.movieid
where city = 'Chicago'
group by city,title
having count(*) = 3
It appears I had the wrong idea about ALL. I realized I could write my query this way to get the movies that appeared in all Chicago locations. Thanks for your help everyone.
with stores as (
select count(distinct(storeid)) as count_store
from store
where city = 'Chicago'
),
count_movies as (
select z.title, count(*) as count
from movie z
join movie_store y on (z.movieid = y.movieid)
join store x on (y.storeid = x.storeid)
where x.city = 'Chicago'
group by z.title
having count(*) = (select count_store from stores)
)
select title from count_movies
This can be done without a window and without mentioning chicago 2 times
select m.title,
s.city
from store s
inner join movie_store ms on s.storeid = ms.storeid
inner join movie m on ms.movieid = m.movieid
cross apply (select count(1) numbers from store s2 where s2.city = s.city group by s2.city) c
where s.city = 'Chicago'
group by m.title, s.city, c.numbers
having count(s.storeid) = c.numbers
Try it yourself at this DBFiddle
After plugging-away for a bit more, and reading this article on Relational Division, I think I've found a much shorter query compared to my original answer:
SELECT
m.movieId,
m.title
FROM
movie AS m
WHERE
NOT EXISTS (
SELECT m.movieId, s.storeId FROM store AS s WHERE city = 'Chicago'
EXCEPT
SELECT r.movieId, r.storeId FROM movie_store AS r WHERE r.movieId = m.movieId
);
The explanation below is based on this slightly longer-form version which is otherwise identical in its relational-calculus:
WITH chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
)
SELECT
m.movieId,
m.title
FROM
movie AS m
WHERE
NOT EXISTS (
-- Generate rows we'd *expect to exist* if a given `m.movieId` is present in every Chicago storeId:
SELECT
m.movieId,
s.storeId
FROM
chicagoStores AS s
-- Then subtract the `movie_store` rows that *actually exist* for this m.movieId:
EXCEPT
SELECT
a.movieId,
a.storeId
FROM
movie_store AS a
WHERE
a.movieId = m.movieId
);
Generate the same chicagoStores set.
Then, filter movieId values in movie (in WHERE) by...
Generate a set of hypothetical movieId, storeId tuples (for that specific movieId value) from chicagoStores...
...and subtract that from the actual rows in movie_store (using EXCEPT)...
This is the same thing as whatIfEveryChicagoStoreHadEveryMovie from my original answer, but computed on a a per-row basis, instead of (conceptually) generating the ( movies X stores ) EXCEPT movie_store Cartesian Product subtraction all-at-once.
...with the implication that this might require less maximum total memory on the database-server, but that's entirely dependent on the execution-plan and DB engine.
...and if there are any hypothetical rows remaining (using EXISTS) after the EXCEPT then it means that the movieId is not available in all Chicagoan stores.
So if we invert the predicate (NOT EXISTS) then that means we can test to see if a specific movieId value is available in all Chicagoan stores.
But be careful as it would also have false-posities when/if the chicagoStores collection was empty (as NOT EXISTS ( x EXCEPT y ) is always true when x is empty, even if y is also empty.
As a workaround, change FROM movie to FROM anyChicagoStoreMovieIds (where anyChicagoStoreMovieIds is defined in my other answer).
The x = ALL y operator isn't what you want.
It sounds like you want an operator to test that "all values in x are also in y", aka an "ALL IN" operator.
Unfortunately ALL IN does not exist in SQL, despite its obvious utility.
The x = ALL y operator actually tests to see if all values in y (the right-hand-side single-column table-expression) equal the single scalar value x.
This functionality is not relevant to the problem as we don't need to test that some result-list equals some single row's column.
The other operators aren't of much use either (e.g. x != ALL y or x < ALL y).
While there is no ALL IN operator in SQL, the concept exists in relational-algebra where it's known as relational division.
Conceptually (and if it existed in SQL) then x DIVIDE y ON y.foo = x.foo would by like GROUP BY x.foo HAVING x.foo ALL IN ( SELECT foo FROM y ) UNGROUP (of course, UNGROUP isn't a thing either).
It's a common PITA to implement relational-division in SQL.
From a high-level, the problem can be broken-down into 4 steps:
Get the set of storeId values for Stores in Chicago.
i.e. SELECT storeId FROM store WHERE city = 'Chicago'
Separately, get the set of movieId values for all movies in those stores.
i.e. SELECT * FROM movie_store AS ms INNER JOIN chicagoStores AS cs ON ms.storeId = cs.storeId
Then group/partition the chicagoMovies set by their separate movieId values.
Then filter out those groups/partitions where each partition's set of storeId values does not equal the chicagoStores set.
But here's the hard-part: SQL does not offer a way to evalaute a predicate condition for each partition in a GROUP BY query.
SQL is more like Relational Calculus, where you describe what you want, as opposed to Relational Algebra, where you describe how you want it done. Linq is an example of a relational-algebra query language in comparison to SQL.
In Linq (for in-memory objects, not Entity Framework), you would do it like this:
HashSet<StoreId> chicagoStores = Stores
.Where( s => s.City == "Chicago" )
.Select( s => s.StoreId )
.ToHashSet();
MovieStores
.GroupBy( ms => ms.StoreId )
.Where( grp => grp.All( ms => chicagoStores.Contains( ms.StoreId ) ) )
.SelectMany( grp => grp )
.Select( ms => ms.Movie )
So I went for a completely different approach in SQL:
Get the set of storeId values for Stores in Chicago:
WITH chicagoStores AS (
SELECT
storeId
FROM
store
WHERE
city = 'Chicago'
)
StoreId
-------
WI01
WI02
WI03
Get the set of movieId values for movies that are in at least 1 Chicago store.
WITH anyChicagoStoreMovieIds AS (
SELECT DISTINCT
ms.movieId
FROM
movie_store AS ms
INNER JOIN chicagoStores AS cs ON
cs.storeId = ms.storeId
)
movieId
-------
D00001
D00003
D00004
D00005
D00006
D00007
Alternatively, the set of all movieId values in movie could be used, but doing that would make Step 3 potentially much slower.
Generate the CROSS APPLY of Step 1 and Step 2, which generates the Cartesian Product of every Chicagoan storeId with every movieId.
Hence why restricting it to a the smaller upper-bound set from Step 2 makes sense, as there's no point including movieId values that don't appear in any Chicago store.
WITH whatIfEveryChicagoStoreHadEveryMovie AS (
SELECT
m.movieId,
cs.movieId
FROM
chicagoStores AS cs
CROSS APPLY anyChicagoStoreMovieIds AS m
)
movieId storeId
-------
D00001 WI01
D00001 WI02
D00001 WI03
D00003 WI01
D00003 WI02
D00003 WI03
D00004 WI01
D00004 WI02
D00004 WI03
D00005 WI01
D00005 WI02
D00005 WI03
D00006 WI01
D00006 WI02
D00006 WI03
D00007 WI01
D00007 WI02
D00007 WI03
Now the hard part:
Consider that if any given Chicago storeId had every movieId possible, then such a row would already-exist in movie_store...
...therefore it follows that if we then subtract actual rows in movie_store from Step 3's result, then we'll be left with the set of ( movieId, storeId ) tuples that don't exist but which would need to exist in order for every Chicago storeId to have that movie.
WITH chicagoanMoviesNotAvailableAtEveryChicagoanStore AS (
SELECT
w.movieId,
w.storeId
FROM
whatIfEveryChicagoStoreHadEveryMovie AS w
LEFT OUTER JOIN movie_store AS ms ON
w.movieId = ms.movieId
AND
w.storeId = ms.storeId
WHERE
ms.storeId IS NULL
)
movieId storeId
----------------
D00003 WI02
D00003 WI03
D00004 WI03
D00005 WI03
Then it's just a matter of subtracting chicagoanMoviesNotAvailableAtEveryChicagoanStore from anyChicagoStoreMovieIds (from Step 3), which gives us the set of movieId values that are available at every Chicagoan storeId:
WITH moviesNotIn_moviesNotInAtLeast1ChicagoStore AS (
SELECT movieId FROM anyChicagoStoreMovieIds
EXCEPT
SELECT movieId FROM moviesNotInAtLeast1ChicagoStore
)
movieId storeId
----------------
D00003 WI02
D00003 WI03
D00004 WI03
D00005 WI03
Which can then be INNER JOINed with movie to get their title information, etc:
SELECT
m.movieId,
m.title
FROM
moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ffs
INNER JOIN movie AS m ON
ffs.movieId = m.movieId;
movieId title
----------------
D00001 'True Lies'
D00006 'Conan 2'
D00007 'Red Sonja'
Thus giving the full final query:
WITH
chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
),
anyChicagoStoreMovieIds AS (
SELECT DISTINCT
ms.movieId
FROM
movie_store AS ms
INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId
),
expectedMovieStores AS (
SELECT
m.movieId,
cs.storeId
FROM
chicagoStores AS cs
CROSS JOIN anyChicagoStoreMovieIds AS m
),
moviesNotInAtLeast1ChicagoStore AS (
SELECT
e.*
FROM
expectedMovieStores AS e
LEFT OUTER JOIN movie_store AS ms ON
e.movieId = ms.movieId
AND
e.storeId = ms.storeId
WHERE
ms.storeId IS NULL
),
moviesNotIn_moviesNotInAtLeast1ChicagoStore AS (
SELECT
movieId
FROM
anyChicagoStoreMovieIds
EXCEPT
SELECT
movieId
FROM
moviesNotInAtLeast1ChicagoStore
)
SELECT
m.movieId,
m.title
FROM
moviesNotIn_moviesNotInAtLeast1ChicagoStore AS ffs
INNER JOIN movie AS m ON
ffs.movieId = m.movieId;
The CTEs that are only used once can be inlined to shorten the query somewhat:
WITH
chicagoStores AS (
SELECT storeId FROM store WHERE city = 'Chicago'
),
anyChicagoStoreMovieIds AS (
SELECT DISTINCT
ms.movieId
FROM
movie_store AS ms
INNER JOIN chicagoStores AS cs ON cs.storeId = ms.storeId
),
expectedMovieStores AS (
SELECT
m.movieId,
cs.storeId
FROM
chicagoStores AS cs
CROSS JOIN anyChicagoStoreMovieIds AS m
)
SELECT
m.movieId,
m.title
FROM
(
SELECT
movieId
FROM
anyChicagoStoreMovieIds
EXCEPT
SELECT
e.movieId
FROM
expectedMovieStores AS e
LEFT OUTER JOIN movie_store AS ms ON
e.movieId = ms.movieId
AND
e.storeId = ms.storeId
WHERE
ms.storeId IS NULL
) AS ffs
INNER JOIN movie AS m ON
ffs.movieId = m.movieId;
...which is stioll rather long and complex for what's described as a CS/SQL homework - this took me over 2 hours to figure out because it was driving me mad.

Write an appropriate SQL query to determine which album has the longest play length (total track duration)

I try to put the same that my lecture did but doesn't work
tables inputs:
Queries:
SELECT T1.TRACK_NAME, T1.TRACK_DURATION
FROM TRACKS T1
INNER JOIN TRACKS T2 ON T1.TRACK_ID = T2.TRACK_ID
WHERE T1.TRACK_DURATION = T2.TRACK_DURATION
ORDER BY T1.TRACK_DURATION;
SELECT T1.TRACK_NAME, T1.TRACK_DURATION
FROM TRACKS T1
INNER JOIN TRACKS T2 ON T1.TRACK_ID = T2.TRACK_ID
WHERE T1.TRACK_DURATION = T2.TRACK_DURATION
ORDER BY T1.TRACK_DURATION;
They show all the tracks and duration
The exercise is that:
3
But the sample results must be this way:
TRACK_NAME TRACK_DURATION
Find You 3,5
Friends. 3,5
Silence 3,5
Rain 4
What About Us 4
Based on the title of your question, this query will give you the title and total duration of the album with the longest total duration of its tracks:
SELECT TOP 1 ALBUM_TITLE, SUM(TRACK_DURATION) AS TotalDuration
FROM ALBUMS a
INNER JOIN ALBUM_TRACKS at ON at.ALBUM_ID = a.ALBUM_ID
INNER JOIN TRACKS t ON t.TRACK_ID = at.TRACK_ID
GROUP BY ALBUM_TITLE
ORDER BY SUM(TRACK_DURATION) DESC
Take your pick:
select * from tracks as t where exists (
selecdt 1 from tracks as t2
where t2.track_id <> t.track_id and t2.track_duration = t.track_duration
)
order by t.track_duration, t.track_name;
select distinct track_name, track_duration
from tracks as t inner join tracks as t2
on t2.track_id <> t.track_id and t2.track_duration = t.track_duration
order by t.track_duration, t.track_name;
select track_name, track_duration
from tracks
where track_id in (
select track_id from tracks groups by track_duration having count(*) > 1
)
order by track_duration, track_name;

Complex SQL query with multiple tables and relations

In this Query, I have to list pair of players with their playerID and playerName who play for the exact same teams.If a player plays for 3 teams, the other has to play for exact same 3 teams. No less, no more. If two players currently do not play for any team, they should also be included. The query should return (playerID1, playername1, playerID2, playerName2) with no repetition such as if player 1 info comes before player 2, there should not be another tuple with player 2 info coming before player 1.
For example if player A plays for yankees and redsox, and player b plays for Yankees, Red Sox, and Dodgers I should not get them. They both have to play for Yankees, and Red Sox and no one else. Right now this query finds answer if players play for any same team.
Tables:
player(playerID: integer, playerName: string)
team(teamID: integer, teamName: string, sport: string)
plays(playerID: integer, teamID: integer)
Example data:
PLAYER
playerID playerName
1 Rondo
2 Allen
3 Pierce
4 Garnett
5 Perkins
TEAM
teamID teamName sport
1 Celtics Basketball
2 Lakers Basketball
3 Patriots Football
4 Red Sox Baseball
5 Bulls Basketball
PLAYS
playerID TeamID
1 1
1 2
1 3
2 1
2 3
3 1
3 3
So I should get this as answer-
2, Allen, 3, Pierce
4, Garnett, 5, Perkins
.
2, Allen, 3 Pierce is an snwer because both play for exclusively CELTICS and PATRIOTS
4, Garnett, 5, Perkins iss an answer because both players play for no teams which should be in output.
Right now the Query I have is
SELECT p1.PLAYERID,
f1.PLAYERNAME,
p2.PLAYERID,
f2.PLAYERNAME
FROM PLAYER f1,
PLAYER f2,
PLAYS p1
FULL OUTER JOIN PLAYS p2
ON p1.PLAYERID < p2.PLAYERID
AND p1.TEAMID = p2.TEAMID
GROUP BY p1.PLAYERID,
f1.PLAYERID,
p2.PLAYERID,
f2.PLAYERID
HAVING Count(p1.PLAYERID) = Count(*)
AND Count(p2.PLAYERID) = Count(*)
AND p1.PLAYERID = f1.PLAYERID
AND p2.PLAYERID = f2.PLAYERID;
I am not 100% sure but I think this finds players who play for the same team but I want to find out players who play for the exclusively all same TEAMS as explained above
I am stuck on how to approach it after this. Any hints on how to approach this problem. Thanks for your time.
I believe this query will do what you want:
SELECT array_agg(players), player_teams
FROM (
SELECT DISTINCT t1.t1player AS players, t1.player_teams
FROM (
SELECT
p.playerid AS t1id,
concat(p.playerid,':', p.playername, ' ') AS t1player,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t1
INNER JOIN (
SELECT
p.playerid AS t2id,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams
Result:
PLAYERS PLAYER_TEAMS
2:Allen,3:Pierce 1,3
4:Garnett,5:Perkins
It uses array_agg over the teamid for each player in plays to match players with the exact same team configuration. I Included a column with the teams for example, but that can be removed without affecting the results as long as it isn't removed from the group by clause.
SQL Fiddle example.Tested with Postgesql 9.2.4
EDIT: Fixed an error that duplicated rows.
Seems that OP probably won't be interested anymore, but in case somebody else finds it useful,
this is query in pure SQL that works (for me at least ;))
SELECT M.p1, pr1.playername, M.p2, pr2.playername FROM player pr1
INNER JOIN player pr2 INNER JOIN
(
SELECT plays1.player p1, plays2.player p2, plays1.team t1 FROM plays plays1
INNER JOIN plays plays2
ON (plays1.player < plays2.player AND plays1.team = plays2.team)
GROUP BY plays1.player, plays2.player HAVING COUNT(*) =
((SELECT COUNT(*) FROM plays plays3 WHERE plays3.player = plays1.player) +
(SELECT COUNT(*) FROM plays plays4 WHERE plays4.player = plays2.player)) /2
) M ON pr1.playerID = M.p1 AND pr2.playerID = M.p2
UNION ALL
SELECT M.pid, M.pname, N.pid2, N.pname2 FROM
(
(SELECT p.playerID pid, p.playerName pname, pl.team FROM player p
LEFT JOIN plays pl ON p.playerId = pl.player WHERE pl.team IS NULL) M
INNER JOIN
(SELECT p.playerID pid2, p.playerName pname2, pl.team FROM player p
LEFT JOIN plays pl ON p.playerId = pl.player WHERE pl.team IS NULL) N
ON (pid < pid2)
)
its not any big deal, here is solution
with gigo as(select a.playerid as playerid,count(b.teamname) as nteams from player a
full outer join plays c on a.playerid=c.playerid full outer join team b
on b.teamid=c.teamid group by a.playerid)
select array_agg(a.*),g.nteams from player a inner join gigo g on a.playerid=g.playerid
group by g.nteams having count(a.*)>1 order by g.nteams desc
This solution works for me :
SELECT TMP1. PLAYERID,TMP2.PLAYERID FROM
(
SELECT a.playerid , a.teamid,b.team_sum
FROM plays A
INNER JOIN
(
SELECT PLAYERID,SUM(teamid) AS team_sum
FROM plays
GROUP BY 1
) B
ON a.playerid=b.playerid
) TMP1
INNER JOIN
(
SELECT a.playerid , a.teamid,b.team_sum
FROM plays A
INNER JOIN
(
SELECT PLAYERID,SUM(teamid) AS team_sum
FROM plays
GROUP BY 1
) B
ON a.playerid=b.playerid
)TMP2
ON TMP1.PLAYERID < TMP2.PLAYERID
AND TMP1.TEAMID=TMP2.TEAMID
AND TMP1.TEAM_SUM=TMP2.TEAM_SUM
GROUP BY 1,2
UNION ALL
SELECT n1,n2 FROM
(
SELECT TMP3.PLAYERID AS n1,TMP4.PLAYERID AS n2 FROM
PLAYER TMP3
INNER JOIN PLAYER TMP4
ON TMP3.PLAYERID<TMP4.PLAYERID
WHERE TMP3.PLAYERID NOT IN (SELECT PLAYERID FROM plays )
AND tmp4.playerid NOT IN (SELECT playerid FROM plays)
) TMP5
Two possible solutions come to mind:
Cursor - Looping through each player and comparing him to all the others until reaching a conclusion.
Recursive query - Same idea though slightly more complicated but defiantly the better way to do it. Probably also has better performance.
Can you provide some sample data so that I can create an example?
It seems like the basic datatype you want is sets, rather than arrays. So one option may be to use PL/Python with code similar to that below (see bottom of this answer for a function that might be adapted to this end). Of course, this isn't a "pure SQL" approach by any means.
But sticking to PostgreSQL (albeit not standard SQL), you may also want to use DISTINCT with array_agg. Note that the following only gives the first pair that meets the criteria (in principle there could be many more).
WITH teams AS (
SELECT playerID, array_agg(DISTINCT teamID ORDER BY teamID) AS teams
FROM plays
GROUP BY playerID),
teams_w_nulls AS (
SELECT a.playerID, b.teams
FROM player AS a
LEFT JOIN teams AS b
ON a.playerID=b.playerID),
player_sets AS (
SELECT teams, array_agg(DISTINCT playerID ORDER BY playerID) AS players
FROM teams_w_nulls
GROUP BY teams
-- exclude players who are only share a team list with themselves.
HAVING array_length(array_agg(DISTINCT playerID ORDER BY playerID),1)>1)
SELECT a.teams, b.playerID, b.playerName, c.playerID, c.playerName
FROM player_sets AS a
INNER JOIN player AS b
ON a.players[1]=b.playerID
INNER JOIN player AS c
ON a.players[2]=c.playerID;
The query above gives the following output:
teams | playerid | playername | playerid | playername
-------+----------+------------+----------+------------
{1,3} | 2 | Allen | 3 | Pierce
| 4 | Garnett | 5 | Perkins
(2 rows)
Example PL/Python functions:
CREATE OR REPLACE FUNCTION set(the_list integer[])
RETURNS integer[] AS
$BODY$
return list(set(the_list))
$BODY$
LANGUAGE plpython2u;
CREATE OR REPLACE FUNCTION pairs(a_set integer[])
RETURNS SETOF integer[] AS
$BODY$
def pairs(x):
for i in range(len(x)):
for j in x[i+1:]:
yield [x[i], j]
return list(pairs(a_set))
$BODY$
LANGUAGE plpython2u;
SELECT set(ARRAY[1, 1, 2, 3, 4, 5, 6, 6]);
Version of code above using these functions (output is similar, but this approach selects all pairs when there is more than one for a given set of teams):
WITH teams AS (
SELECT playerID, set(array_agg(teamID)) AS teams
FROM plays
GROUP BY playerID),
teams_w_nulls AS (
SELECT a.playerID, b.teams
FROM player AS a
LEFT JOIN teams AS b
ON a.playerID=b.playerID),
player_pairs AS (
SELECT teams, pairs(set(array_agg(playerID))) AS pairs
FROM teams_w_nulls
GROUP BY teams)
-- no need to exclude players who are only share a team
-- list with themselves.
SELECT teams, pairs[1] AS player_1, pairs[2] AS player_2
FROM player_pairs;
We make a query with the count of the teams per player and sum of ascii(team_name)+team_id call it team_value. We do a self join, of the same query with itself where counts and team_values match but id not equal to id, that gives us the ID's we want to fetch
select * from player where player_id in
(
select set2.player_id orig
from
(select count(*) count,b.player_id , nvl(sum(a.team_id+ascii(team_name)),0) team_value
from plays a, player b , team c
where a.player_id(+)=b.player_id
and a.team_id = c.team_id(+)
group by b.player_id) set1,
(select count(*) count,b.player_id , nvl(sum(a.team_id+ascii(team_name)),0) team_value
from plays a, player b , team c
where a.player_id(+)=b.player_id
and a.team_id = c.team_id(+)
group by b.player_id) set2
where set1.count=set2.count and set1.team_value=set2.team_value
and set1.player_id<>set2.player_id
)
Here is the simple query with UNION and 2-3 simple joins.
1st query before UNION contains player name and playerid who has played for same number of teams for equal number of times.
2nd query after UNION contains player name and playerid who has not played for any team at all.
Simply copy paste this query and try to execute it, you will see the expected results.
select playername,c.playerid from
(select a.cnt, a.playerid from
(select count(1) cnt , PLAYERID from plays group by PLAYERID) a ,
(select count(1) cnt , PLAYERID from plays group by PLAYERID) b
where a.cnt=b.cnt
and a.playerid<> b.playerid ) c ,PLAYER d
where c.playerid=d.playerid
UNION
select e.playername,e.playerid
from player e
left outer join plays f on
e.playerid=f.playerid where nvl(teamid,0 )=0
Try this one :
Here test is PLAYS table in your question.
select group_concat(b.name),a.teams from
(SELECT playerid, group_concat(distinct teamid ORDER BY teamid) AS teams
FROM test
GROUP BY playerid) a, player b
where a.playerid=b.playerid
group by a.teams
union
select group_concat(c.name order by c.playerid),null from player c where c.playerid not in (select playerid from test);
For anyone interested, this simple query works for me
SELECT UNIQUE PLR1.PID,PLR1.PNAME, PLR2.PID, PLR2.PNAME
FROM PLAYS PLY1,PLAYS PLY2, PLAYER PLR1, PLAYER PLR2
WHERE PLR1.PID < PLR2.PID AND PLR1.PID = PLY1.PID(+) AND PLR2.PID = PLY2.PID(+)
AND NOT EXISTS(( SELECT PLY3.TEAMID FROM PLAYS PLY3 WHERE PLY3.PID = PLR1.PID)
MINUS
( SELECT PLY4.TEAMID FROM PLAYS PLY4 WHERE PLY4.PID = PLR2.PID));
select p1.playerId, p2.playerId, count(p1.playerId)
from plays p1, plays p2
WHERE p1.playerId<p2.playerId
and p1.teamId = p2.teamId
GROUP BY p1.playerId, p2.playerId
having count(*) = (select count(*) from plays where playerid = p1.playerid)
WITH temp AS (
SELECT p.playerid, p.playername, listagg(t.teamname,',') WITHIN GROUP (ORDER BY t.teamname) AS teams
FROM player p full OUTER JOIN plays p1 ON p.playerid = p1.playerid
LEFT JOIN team t ON p1.teamid = t.teamid GROUP BY (p.playerid , p.playername))
SELECT concat(concat(t1.playerid,','), t1.playername), t1.teams
FROM temp t1 WHERE nvl(t1.teams,' ') IN (
SELECT nvl(t2.teams,' ') FROM temp t2
WHERE t1.playerid <> t2.playerid)
ORDER BY t1.playerid
This is ANSI SQL , without using any special functions.
SELECT TAB1.T1_playerID AS playerID1 , TAB1.playerName1 ,
TAB1.T2_playerID AS playerID2, TAB1. playerName2
FROM
(select T1.playerID AS T1_playerID , T3. playerName AS playerName1 ,
T2.playerID AS T2_playerID , T4. playerName AS playerName2 ,COUNT (T1.TeamID) AS MATCHING_TEAM_ID_CNT
FROM PLAYS T1
INNER JOIN PLAYS T2 ON( T1.TeamID = T2.TeamID AND T1.playerID <> T2.playerID )
INNER JOIN player T3 ON ( T1.playerID=T3.playerID)
INNER JOIN player T4 ON ( T2.playerID=T4.playerID)
GROUP BY 1,2,3,4
) TAB1
INNER JOIN
( SELECT T1.playerID AS playerID, COUNT(T1.TeamID) AS TOTAL_TEAM_CNT
FROM PLAYS T1
GROUP BY T1.playerID) TAB2
ON(TAB1.T2_playerID=TAB2.playerID AND
TAB1.MATCHING_TEAM_ID_CNT =TAB2.TOTAL_TEAM_CNT)
INNER JOIN
( SELECT T1.playerID AS playerID, COUNT(T1.TeamID) AS TOTAL_TEAM_CNT
FROM PLAYS T1
GROUP BY T1.playerID
) TAB3
ON( TAB1. T1_playerID = TAB3.playerID AND
TAB1.MATCHING_TEAM_ID_CNT=TAB3.TOTAL_TEAM_CNT)
WHERE playerID1 < playerID2
UNION ALL (
SELECT T1.playerID, T1.playerName ,T2.playerID,T2.playerName
FROM
PLAYER T1 INNER JOIN PLAYER T2
ON (T1.playerID<T2.playerID)
WHERE T1.playerID NOT IN ( SELECT playerID FROM PLAYS))
Assuming your teamId is unique this query will work. It simply identifies all players that have the exact same teams by summing the teamid or if the player has no ids it will be null. Then counts the number of matches over team matches. I tested using SQL fiddle in postgre 9.3.
SELECT
b.playerID
,b.playerName
FROM (
--Join the totals of teams to your player information and then count over the team matches.
SELECT
p.playerID
,p.playerName
,m.TeamMatches
,COUNT(*) OVER(PARTITION BY TeamMatches) as Matches
FROM player p
LEFT JOIN (
--Assuming your teamID is unique as it should be. If it is then a sum of the team ids for a player will give you each team they play for.
--If for some reason your team id is not unique then rank the table and join same as below.
SELECT
ps.playerName
,ps.playerID
,SUM(t.teamID) as TeamMatches
FROM plays p
LEFT JOIN team t ON p.teamID = p.teamID
LEFT JOIN player ps ON p.playerID = ps.playerID
GROUP BY
ps.playerName
,ps.playerID
) m ON p.playerID = m.playerID
) b
WHERE
b.Matches <> 1
This Query should solve it.
By doing a self join on PLAYS.
- Compare on the player Id
- Compare the matching row count with the total count for each player.
select p1.playerId, p2.playerId, count(p1.playerId)
from plays p1, plays p2
WHERE p1.playerId<p2.playerId
and p1.teamId = p2.teamId
GROUP BY p1.playerId, p2.playerId
having count(*) = (select count(*) from plays where playerid = p1.playerid)
Create function in SQl 2008
ALTER FUNCTION [dbo].[fngetTeamIDs] ( #PayerID int ) RETURNS varchar(101) AS Begin
declare #str varchar(1000)
SELECT #str= coalesce(#str + ', ', '') + CAST(a.TeamID AS varchar(100)) FROM (SELECT DISTINCT TeamID from Plays where PayerId=#PayerID) a
return #str
END
--select dbo.fngetTeamIDs(2)
Query start here
drop table #temp,#A,#B,#C,#D
(select PayerID,count(*) count
into #temp
from Plays
group by PayerID)
select *
into #A
from #temp as T
where T.count in (
select T1.count from #temp as T1
group by T1.count having count(T1.count)>1
)
select A.*,P.TeamID
into #B
from #A A inner join Plays P
on A.PayerID=P.PayerID
order by A.count
select B.PayerId,B.count,
(
select dbo.fngetTeamIDs(B.PayerId)
) as TeamIDs
into #C
from #B B
group by B.PayerId,B.count
select TeamIDs
into #D
from #c as C
group by C.TeamIDs
having count(C.TeamIDs)>1
select C.PayerId,P.PlayerName,D.TeamIDs
from #D D inner join #C C
on D.TeamIDs=C.TeamIDs
inner join Player P
on C.PayerID=P.PlayerID

SQL Server: querying hierarchical and referenced data

I'm working on an asset database that has a hierarchy. Also, there is a "ReferenceAsset" table, that effectively points back to an asset. The Reference Asset basically functions as an override, but it is selected as if it were a unique, new asset. One of the overrides that gets set, is the parent_id.
Columns that are relevant to selecting the heirarchy:
Asset: id (primary), parent_id
Asset Reference: id (primary), asset_id (foreignkey->Asset), parent_id (always an Asset)
---EDITED 5/27----
Sample Relevent Table Data (after joins):
id | asset_id | name | parent_id | milestone | type
3 3 suit null march shape
4 4 suit_banker 3 april texture
5 5 tie null march shape
6 6 tie_red 5 march texture
7 7 tie_diamond 5 june texture
-5 6 tie_red 4 march texture
the id < 0 (like the last row) signify assets that are referenced. Referenced assets have a few columns that are overidden (in this case, only parent_id is important).
The expectation is that if I select all assets from april, I should do a secondary select to get the entire tree branches of the matching query:
so initially the query match would result in:
4 4 suit_banker 3 april texture
Then after the CTE, we get the complete hierarchy and our result should be this (so far this is working)
3 3 suit null march shape
4 4 suit_banker 3 april texture
-5 6 tie_red 4 march texture
and you see, the parent of id:-5 is there, but what is missing, that is needed, is the referenced asset, and the parent of the referenced asset:
5 5 tie null march shape
6 6 tie_red 5 march texture
Currently my solution works for this, but it is limited to only a single depth of references (and I feel the implementation is quite ugly).
---Edited----
Here is my primary Selection Function. This should better demonstrate where the real complication lies: the AssetReference.
Select A.id as id, A.id as asset_id, A.name,A.parent_id as parent_id, A.subPath, T.name as typeName, A2.name as parent_name, B.name as batchName,
L.name as locationName,AO.owner_name as ownerName, T.id as typeID,
M.name as milestoneName, A.deleted as bDeleted, 0 as reference, W.phase_name, W.status_name
FROM Asset as A Inner Join Type as T on A.type_id = T.id
Inner Join Batch as B on A.batch_id = B.id
Left Join Location L on A.location_id = L.id
Left Join Asset A2 on A.parent_id = A2.id
Left Join AssetOwner AO on A.owner_id = AO.owner_id
Left Join Milestone M on A.milestone_id = M.milestone_id
Left Join Workflow as W on W.asset_id = A.id
where A.deleted <= #showDeleted
UNION
Select -1*AR.id as id, AR.asset_id as asset_id, A.name, AR.parent_id as parent_id, A.subPath, T.name as typeName, A2.name as parent_name, B.name as batchName,
L.name as locationName,AO.owner_name as ownerName, T.id as typeID,
M.name as milestoneName, A.deleted as bDeleted, 1 as reference, NULL as phase_name, NULL as status_name
FROM Asset as A Inner Join Type as T on A.type_id = T.id
Inner Join Batch as B on A.batch_id = B.id
Left Join Location L on A.location_id = L.id
Left Join Asset A2 on AR.parent_id = A2.id
Left Join AssetOwner AO on A.owner_id = AO.owner_id
Left Join Milestone M on A.milestone_id = M.milestone_id
Inner Join AssetReference AR on AR.asset_id = A.id
where A.deleted <= #showDeleted
I have a stored procedure that takes a temp table (#temp) and finds all the elements of the hierarchy. The strategy I employed was this:
Select the entire system heirarchy into a temp table (#treeIDs) represented by a comma separated list of each entire tree branch
Get entire heirarchy of assets matching query (from #temp)
Get all reference assets pointed to by Assets from heirarchy
Parse the heirarchy of all reference assets
This works for now because reference assets are always the last item on a branch, but if they weren't, i think i would be in trouble. I feel like i need some better form of recursion.
Here is my current code, which is working, but i am not proud of it, and I know it is not robust (because it only works if the references are at the bottom):
Step 1. build the entire hierarchy
;WITH Recursive_CTE AS (
SELECT Cast(id as varchar(100)) as Hierarchy, parent_id, id
FROM #assetIDs
Where parent_id is Null
UNION ALL
SELECT
CAST(parent.Hierarchy + ',' + CAST(t.id as varchar(100)) as varchar(100)) as Hierarchy, t.parent_id, t.id
FROM Recursive_CTE parent
INNER JOIN #assetIDs t ON t.parent_id = parent.id
)
Select Distinct h.id, Hierarchy as idList into #treeIDs
FROM ( Select Hierarchy, id FROM Recursive_CTE ) parent
CROSS APPLY dbo.SplitIDs(Hierarchy) as h
Step 2. Select the branches of all assets that match the query
Select DISTINCT L.id into #RelativeIDs FROM #treeIDs
CROSS APPLY dbo.SplitIDs(idList) as L
WHERE #treeIDs.id in (Select id FROM #temp)
Step 3. Get all Reference Assets in the branches
(Reference assets have negative id values, hence the id < 0 part)
Select asset_id INTO #REFLinks FROM #AllAssets WHERE id in
(Select #AllAssets.asset_id FROM #AllAssets Inner Join #RelativeIDs
on #AllAssets.id = #RelativeIDs.id Where #RelativeIDs.id < 0)
Step 4. Get the branches of anything found in step 3
Select DISTINCT L.id into #extraRelativeIDs FROM #treeIDs
CROSS APPLY dbo.SplitIDs(idList) as L
WHERE
exists (Select #REFLinks.asset_id FROM #REFLinks WHERE #REFLinks.asset_id = #treeIDs.id)
and Not Exists (select id FROM #RelativeIDs Where id = #treeIDs.id)
I've tried to just show the relevant code. I am super grateful to anyone who can help me find a better solution!
--getting all of the children of a root node ( could be > 1 ) and it would require revising the query a bit
DECLARE #AssetID int = (select AssetId from Asset where AssetID is null);
--algorithm is relational recursion
--gets the top level in hierarchy we want. The hierarchy column
--will show the row's place in the hierarchy from this query only
--not in the overall reality of the row's place in the table
WITH Hierarchy(Asset_ID, AssetID, Levelcode, Asset_hierarchy)
AS
(
SELECT AssetID, Asset_ID,
1 as levelcode, CAST(Assetid as varchar(max)) as Asset_hierarchy
FROM Asset
WHERE AssetID=#AssetID
UNION ALL
--joins back to the CTE to recursively retrieve the rows
--note that treelevel is incremented on each iteration
SELECT A.Parent_ID, B.AssetID,
Levelcode + 1 as LevelCode,
A.assetID + '\' + cast(A.Asset_id as varchar(20)) as Asset_Hierarchy
FROM Asset AS a
INNER JOIN dbo.Batch AS Hierarchy
--use to get children, since the parentId of the child will be set the value
--of the current row
on a.assetId= b.assetID
--use to get parents, since the parent of the Asset_Hierarchy row will be the asset,
--not the parent.
on Asset.AssetId= Asset_Hierarchy.parentID
SELECT a.Assetid,a.name,
Asset_Hierarchy.LevelCode, Asset_Hierarchy.hierarchy
FROM Asset AS a
INNER JOIN Asset_Hierarchy
ON A.AssetID= Asset_Hierarchy.AssetID
ORDER BY Hierarchy ;
--return results from the CTE, joining to the Asset data to get the asset name
---that is the structure you will want. I would need a little more clarification of your table structure
It would help to know your underlying table structure. There are two approaches which should work depending on your environment: SQL understands XML so you could have your SQL as an xml structure or simply have a single table with each row item having a unique primary key id and a parentid. id is the fk for the parentid. The data for the node are just standard columns. You can use a cte or a function powering a calculated column to determin the degree of nesting for each node. The limit is that a node can only have one parent.

Limiting recursion to certain level

I have a SQL table named Player and another called Team.
Each Player MUST belong to a team via a foreign key TeamID.
Each Team can belong to another Team via a recursive field ParentTeamID.
So it could be (top down)...
TeamA
TeamB
Team76
Group8
Player_ME
My question is, if I'm given a Player's PlayerID (the PK for that table), what is the best way to get the top Team?
My query so far (which gets all teams):
WITH TeamTree
AS (
SELECT ParentTeam.*, Player.PlayerID, 0 as Level
FROM Team ParentTeam
INNER JOIN Player ON Player.TeamID = ParentTeam.TeamID
WHERE Player.PlayerID IN (SELECT * FROM dbo.Split(#PlayerIDs,','))
UNION ALL
SELECT ChildTeam.*, TeamTree.PlayerID AS PlayerID, TeamTree.Level + 1
FROM Team ChildTeam
INNER JOIN TeamTree TeamTree
ON ChildTeam.TeamID = TeamTree.ParentTeamID
)
Now whilst I think this is the right place to start I think there may be a better way. Plus I'm kinda stuck! I tried using Level in a join (inside a subquery) but it didn't work.
Any ideas on how to work my way up the tree and get only the top level details?
Edit:
A ParentTeam CAN be a ParentTeam (infinite recursion), but a Player can only belong to one Team.
Data Structure
Team:
TeamID (PK), Name, ParentTeamID (Recursive field)
Player:
PlayerID (PK), Name, TeamID (FK)
Sample Data:
Team:
1, TeamA, NULL
2, TeamB, 1
3, Team76, 2
4, Group8, 3
Player:
1, Player_ME, 4
2, Player_TWO, 2
So with the above data, both players should show (in the query) that they have a "TopLevelTeam" of TeamA
I believe this is what you are looking for, with a bit of extra info thrown in for free :-)
Andrew had the correct idea in his edited version, but I think his implementation is incorrect.
The schema and query are available at SQL Fiddle
with teamCTE as (
select TeamID,
TeamName,
cast(null as int) as ParentTeamID,
cast(null as varchar(10)) as ParentTeamName,
TeamID TopTeamID,
TeamName TopTeamName,
1 as TeamLevel
from team
where ParentTeamID is null
union all
select t.TeamID,
t.TeamName,
c.TeamID,
c.TeamName,
c.TopTeamID,
c.TopTeamName,
TeamLevel+1 as TeamLevel
from team t
join teamCTE c
on t.ParentTeamID = c.TeamID
)
select p.PlayerID,
p.PlayerName,
t.*
from player p
join teamCTE t
on p.TeamID = t.TeamID
EDIT - answer to question in comment
You can navigate to any level within the player's team hierarchy simply by joining to the CTE a 2nd time. In your case you asked for the 2nd top most team: SQL Fiddle
with teamCTE as (
select TeamID,
TeamName,
cast(null as int) as ParentTeamID,
cast(null as varchar(10)) as ParentTeamName,
TeamID TopTeamID,
TeamName TopTeamName,
1 as TeamLevel
from team
where ParentTeamID is null
union all
select t.TeamID,
t.TeamName,
c.TeamID,
c.TeamName,
c.TopTeamID,
c.TopTeamName,
TeamLevel+1 as TeamLevel
from team t
join teamCTE c
on t.ParentTeamID = c.TeamID
)
select p.PlayerID,
p.PlayerName,
t1.*,
t2.TeamID Level2TeamID,
t2.TeamName Level2TeamName
from player p
join teamCTE t1
on p.TeamID = t1.TeamID
join teamCTE t2
on t1.TopTeamID = t2.TopTeamID
and t2.TeamLevel=2
WITH TeamTree
AS (
SELECT ParentTeam.*, Player.PlayerID AS UrPlayerID, 0 as Level
FROM Team ParentTeam
INNER JOIN Player ON Player.TeamID = ParentTeam.TeamID
WHERE Player.PlayerID IN (SELECT * FROM dbo.Split(#PlayerIDs,','))
UNION ALL
SELECT ChildTeam.*, TeamTree.PlayerID AS PlayerID, TeamTree.Level + 1
FROM Team ChildTeam
INNER JOIN TeamTree TeamTree
ON ChildTeam.ParentTeamID = TeamTree.TeamID /* These were reversed, I think */
AND UrPlayerID=ChildTeam.PlayerID /* ADDED */
)
Otherwise you get a huge duplication of rows, something like the square of the number of players, don't you?
--
(After comment below)
Quite right, I misread the schema. Look, you don't need to bring the player in until the very end. I thought the team tree arrangement might differ by player, but it doesn't. So
WITH recursive TeamTree AS (
SELECT TeamID, ParentTeamID FROM Team T1
UNION ALL
SELECT T1.TeamID, T2.ParentTeamID FROM T1 JOIN T2 ON T1.ParentTeamID=T2.TeamID
)
SELECT TeamTree.* FROM TeamTree JOIN Team T3
ON TeamTree.ParentTeamID=T3.TeamID WHERE T3.ParentTeamID IS NULL;
This gives you a table of each team and its root ancestor. Now join that to the player table.
SELECT * FROM Player JOIN (WITH TeamTree AS (
SELECT TeamID, ParentTeamID FROM Team T1
UNION ALL
SELECT T1.TeamID, T2.ParentTeamID FROM T1 JOIN T2 ON T1.ParentTeamID=T2.TeamID
)
SELECT TeamTree.* FROM TeamTree JOIN Team T3
ON TeamTree.ParentTeamID=T3.TeamID WHERE T3.ParentTeamID IS NULL) teamtree2
ON Player.TeamID=teamtree2.TeamID;
You can rejoin with Team if you need more columns.