Complex SQL query with multiple tables and relations - sql

In this Query, I have to list pair of players with their playerID and playerName who play for the exact same teams.If a player plays for 3 teams, the other has to play for exact same 3 teams. No less, no more. If two players currently do not play for any team, they should also be included. The query should return (playerID1, playername1, playerID2, playerName2) with no repetition such as if player 1 info comes before player 2, there should not be another tuple with player 2 info coming before player 1.
For example if player A plays for yankees and redsox, and player b plays for Yankees, Red Sox, and Dodgers I should not get them. They both have to play for Yankees, and Red Sox and no one else. Right now this query finds answer if players play for any same team.
Tables:
player(playerID: integer, playerName: string)
team(teamID: integer, teamName: string, sport: string)
plays(playerID: integer, teamID: integer)
Example data:
PLAYER
playerID playerName
1 Rondo
2 Allen
3 Pierce
4 Garnett
5 Perkins
TEAM
teamID teamName sport
1 Celtics Basketball
2 Lakers Basketball
3 Patriots Football
4 Red Sox Baseball
5 Bulls Basketball
PLAYS
playerID TeamID
1 1
1 2
1 3
2 1
2 3
3 1
3 3
So I should get this as answer-
2, Allen, 3, Pierce
4, Garnett, 5, Perkins
.
2, Allen, 3 Pierce is an snwer because both play for exclusively CELTICS and PATRIOTS
4, Garnett, 5, Perkins iss an answer because both players play for no teams which should be in output.
Right now the Query I have is
SELECT p1.PLAYERID,
f1.PLAYERNAME,
p2.PLAYERID,
f2.PLAYERNAME
FROM PLAYER f1,
PLAYER f2,
PLAYS p1
FULL OUTER JOIN PLAYS p2
ON p1.PLAYERID < p2.PLAYERID
AND p1.TEAMID = p2.TEAMID
GROUP BY p1.PLAYERID,
f1.PLAYERID,
p2.PLAYERID,
f2.PLAYERID
HAVING Count(p1.PLAYERID) = Count(*)
AND Count(p2.PLAYERID) = Count(*)
AND p1.PLAYERID = f1.PLAYERID
AND p2.PLAYERID = f2.PLAYERID;
I am not 100% sure but I think this finds players who play for the same team but I want to find out players who play for the exclusively all same TEAMS as explained above
I am stuck on how to approach it after this. Any hints on how to approach this problem. Thanks for your time.

I believe this query will do what you want:
SELECT array_agg(players), player_teams
FROM (
SELECT DISTINCT t1.t1player AS players, t1.player_teams
FROM (
SELECT
p.playerid AS t1id,
concat(p.playerid,':', p.playername, ' ') AS t1player,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t1
INNER JOIN (
SELECT
p.playerid AS t2id,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams
Result:
PLAYERS PLAYER_TEAMS
2:Allen,3:Pierce 1,3
4:Garnett,5:Perkins
It uses array_agg over the teamid for each player in plays to match players with the exact same team configuration. I Included a column with the teams for example, but that can be removed without affecting the results as long as it isn't removed from the group by clause.
SQL Fiddle example.Tested with Postgesql 9.2.4
EDIT: Fixed an error that duplicated rows.

Seems that OP probably won't be interested anymore, but in case somebody else finds it useful,
this is query in pure SQL that works (for me at least ;))
SELECT M.p1, pr1.playername, M.p2, pr2.playername FROM player pr1
INNER JOIN player pr2 INNER JOIN
(
SELECT plays1.player p1, plays2.player p2, plays1.team t1 FROM plays plays1
INNER JOIN plays plays2
ON (plays1.player < plays2.player AND plays1.team = plays2.team)
GROUP BY plays1.player, plays2.player HAVING COUNT(*) =
((SELECT COUNT(*) FROM plays plays3 WHERE plays3.player = plays1.player) +
(SELECT COUNT(*) FROM plays plays4 WHERE plays4.player = plays2.player)) /2
) M ON pr1.playerID = M.p1 AND pr2.playerID = M.p2
UNION ALL
SELECT M.pid, M.pname, N.pid2, N.pname2 FROM
(
(SELECT p.playerID pid, p.playerName pname, pl.team FROM player p
LEFT JOIN plays pl ON p.playerId = pl.player WHERE pl.team IS NULL) M
INNER JOIN
(SELECT p.playerID pid2, p.playerName pname2, pl.team FROM player p
LEFT JOIN plays pl ON p.playerId = pl.player WHERE pl.team IS NULL) N
ON (pid < pid2)
)

its not any big deal, here is solution
with gigo as(select a.playerid as playerid,count(b.teamname) as nteams from player a
full outer join plays c on a.playerid=c.playerid full outer join team b
on b.teamid=c.teamid group by a.playerid)
select array_agg(a.*),g.nteams from player a inner join gigo g on a.playerid=g.playerid
group by g.nteams having count(a.*)>1 order by g.nteams desc

This solution works for me :
SELECT TMP1. PLAYERID,TMP2.PLAYERID FROM
(
SELECT a.playerid , a.teamid,b.team_sum
FROM plays A
INNER JOIN
(
SELECT PLAYERID,SUM(teamid) AS team_sum
FROM plays
GROUP BY 1
) B
ON a.playerid=b.playerid
) TMP1
INNER JOIN
(
SELECT a.playerid , a.teamid,b.team_sum
FROM plays A
INNER JOIN
(
SELECT PLAYERID,SUM(teamid) AS team_sum
FROM plays
GROUP BY 1
) B
ON a.playerid=b.playerid
)TMP2
ON TMP1.PLAYERID < TMP2.PLAYERID
AND TMP1.TEAMID=TMP2.TEAMID
AND TMP1.TEAM_SUM=TMP2.TEAM_SUM
GROUP BY 1,2
UNION ALL
SELECT n1,n2 FROM
(
SELECT TMP3.PLAYERID AS n1,TMP4.PLAYERID AS n2 FROM
PLAYER TMP3
INNER JOIN PLAYER TMP4
ON TMP3.PLAYERID<TMP4.PLAYERID
WHERE TMP3.PLAYERID NOT IN (SELECT PLAYERID FROM plays )
AND tmp4.playerid NOT IN (SELECT playerid FROM plays)
) TMP5

Two possible solutions come to mind:
Cursor - Looping through each player and comparing him to all the others until reaching a conclusion.
Recursive query - Same idea though slightly more complicated but defiantly the better way to do it. Probably also has better performance.
Can you provide some sample data so that I can create an example?

It seems like the basic datatype you want is sets, rather than arrays. So one option may be to use PL/Python with code similar to that below (see bottom of this answer for a function that might be adapted to this end). Of course, this isn't a "pure SQL" approach by any means.
But sticking to PostgreSQL (albeit not standard SQL), you may also want to use DISTINCT with array_agg. Note that the following only gives the first pair that meets the criteria (in principle there could be many more).
WITH teams AS (
SELECT playerID, array_agg(DISTINCT teamID ORDER BY teamID) AS teams
FROM plays
GROUP BY playerID),
teams_w_nulls AS (
SELECT a.playerID, b.teams
FROM player AS a
LEFT JOIN teams AS b
ON a.playerID=b.playerID),
player_sets AS (
SELECT teams, array_agg(DISTINCT playerID ORDER BY playerID) AS players
FROM teams_w_nulls
GROUP BY teams
-- exclude players who are only share a team list with themselves.
HAVING array_length(array_agg(DISTINCT playerID ORDER BY playerID),1)>1)
SELECT a.teams, b.playerID, b.playerName, c.playerID, c.playerName
FROM player_sets AS a
INNER JOIN player AS b
ON a.players[1]=b.playerID
INNER JOIN player AS c
ON a.players[2]=c.playerID;
The query above gives the following output:
teams | playerid | playername | playerid | playername
-------+----------+------------+----------+------------
{1,3} | 2 | Allen | 3 | Pierce
| 4 | Garnett | 5 | Perkins
(2 rows)
Example PL/Python functions:
CREATE OR REPLACE FUNCTION set(the_list integer[])
RETURNS integer[] AS
$BODY$
return list(set(the_list))
$BODY$
LANGUAGE plpython2u;
CREATE OR REPLACE FUNCTION pairs(a_set integer[])
RETURNS SETOF integer[] AS
$BODY$
def pairs(x):
for i in range(len(x)):
for j in x[i+1:]:
yield [x[i], j]
return list(pairs(a_set))
$BODY$
LANGUAGE plpython2u;
SELECT set(ARRAY[1, 1, 2, 3, 4, 5, 6, 6]);
Version of code above using these functions (output is similar, but this approach selects all pairs when there is more than one for a given set of teams):
WITH teams AS (
SELECT playerID, set(array_agg(teamID)) AS teams
FROM plays
GROUP BY playerID),
teams_w_nulls AS (
SELECT a.playerID, b.teams
FROM player AS a
LEFT JOIN teams AS b
ON a.playerID=b.playerID),
player_pairs AS (
SELECT teams, pairs(set(array_agg(playerID))) AS pairs
FROM teams_w_nulls
GROUP BY teams)
-- no need to exclude players who are only share a team
-- list with themselves.
SELECT teams, pairs[1] AS player_1, pairs[2] AS player_2
FROM player_pairs;

We make a query with the count of the teams per player and sum of ascii(team_name)+team_id call it team_value. We do a self join, of the same query with itself where counts and team_values match but id not equal to id, that gives us the ID's we want to fetch
select * from player where player_id in
(
select set2.player_id orig
from
(select count(*) count,b.player_id , nvl(sum(a.team_id+ascii(team_name)),0) team_value
from plays a, player b , team c
where a.player_id(+)=b.player_id
and a.team_id = c.team_id(+)
group by b.player_id) set1,
(select count(*) count,b.player_id , nvl(sum(a.team_id+ascii(team_name)),0) team_value
from plays a, player b , team c
where a.player_id(+)=b.player_id
and a.team_id = c.team_id(+)
group by b.player_id) set2
where set1.count=set2.count and set1.team_value=set2.team_value
and set1.player_id<>set2.player_id
)

Here is the simple query with UNION and 2-3 simple joins.
1st query before UNION contains player name and playerid who has played for same number of teams for equal number of times.
2nd query after UNION contains player name and playerid who has not played for any team at all.
Simply copy paste this query and try to execute it, you will see the expected results.
select playername,c.playerid from
(select a.cnt, a.playerid from
(select count(1) cnt , PLAYERID from plays group by PLAYERID) a ,
(select count(1) cnt , PLAYERID from plays group by PLAYERID) b
where a.cnt=b.cnt
and a.playerid<> b.playerid ) c ,PLAYER d
where c.playerid=d.playerid
UNION
select e.playername,e.playerid
from player e
left outer join plays f on
e.playerid=f.playerid where nvl(teamid,0 )=0

Try this one :
Here test is PLAYS table in your question.
select group_concat(b.name),a.teams from
(SELECT playerid, group_concat(distinct teamid ORDER BY teamid) AS teams
FROM test
GROUP BY playerid) a, player b
where a.playerid=b.playerid
group by a.teams
union
select group_concat(c.name order by c.playerid),null from player c where c.playerid not in (select playerid from test);

For anyone interested, this simple query works for me
SELECT UNIQUE PLR1.PID,PLR1.PNAME, PLR2.PID, PLR2.PNAME
FROM PLAYS PLY1,PLAYS PLY2, PLAYER PLR1, PLAYER PLR2
WHERE PLR1.PID < PLR2.PID AND PLR1.PID = PLY1.PID(+) AND PLR2.PID = PLY2.PID(+)
AND NOT EXISTS(( SELECT PLY3.TEAMID FROM PLAYS PLY3 WHERE PLY3.PID = PLR1.PID)
MINUS
( SELECT PLY4.TEAMID FROM PLAYS PLY4 WHERE PLY4.PID = PLR2.PID));

select p1.playerId, p2.playerId, count(p1.playerId)
from plays p1, plays p2
WHERE p1.playerId<p2.playerId
and p1.teamId = p2.teamId
GROUP BY p1.playerId, p2.playerId
having count(*) = (select count(*) from plays where playerid = p1.playerid)

WITH temp AS (
SELECT p.playerid, p.playername, listagg(t.teamname,',') WITHIN GROUP (ORDER BY t.teamname) AS teams
FROM player p full OUTER JOIN plays p1 ON p.playerid = p1.playerid
LEFT JOIN team t ON p1.teamid = t.teamid GROUP BY (p.playerid , p.playername))
SELECT concat(concat(t1.playerid,','), t1.playername), t1.teams
FROM temp t1 WHERE nvl(t1.teams,' ') IN (
SELECT nvl(t2.teams,' ') FROM temp t2
WHERE t1.playerid <> t2.playerid)
ORDER BY t1.playerid

This is ANSI SQL , without using any special functions.
SELECT TAB1.T1_playerID AS playerID1 , TAB1.playerName1 ,
TAB1.T2_playerID AS playerID2, TAB1. playerName2
FROM
(select T1.playerID AS T1_playerID , T3. playerName AS playerName1 ,
T2.playerID AS T2_playerID , T4. playerName AS playerName2 ,COUNT (T1.TeamID) AS MATCHING_TEAM_ID_CNT
FROM PLAYS T1
INNER JOIN PLAYS T2 ON( T1.TeamID = T2.TeamID AND T1.playerID <> T2.playerID )
INNER JOIN player T3 ON ( T1.playerID=T3.playerID)
INNER JOIN player T4 ON ( T2.playerID=T4.playerID)
GROUP BY 1,2,3,4
) TAB1
INNER JOIN
( SELECT T1.playerID AS playerID, COUNT(T1.TeamID) AS TOTAL_TEAM_CNT
FROM PLAYS T1
GROUP BY T1.playerID) TAB2
ON(TAB1.T2_playerID=TAB2.playerID AND
TAB1.MATCHING_TEAM_ID_CNT =TAB2.TOTAL_TEAM_CNT)
INNER JOIN
( SELECT T1.playerID AS playerID, COUNT(T1.TeamID) AS TOTAL_TEAM_CNT
FROM PLAYS T1
GROUP BY T1.playerID
) TAB3
ON( TAB1. T1_playerID = TAB3.playerID AND
TAB1.MATCHING_TEAM_ID_CNT=TAB3.TOTAL_TEAM_CNT)
WHERE playerID1 < playerID2
UNION ALL (
SELECT T1.playerID, T1.playerName ,T2.playerID,T2.playerName
FROM
PLAYER T1 INNER JOIN PLAYER T2
ON (T1.playerID<T2.playerID)
WHERE T1.playerID NOT IN ( SELECT playerID FROM PLAYS))

Assuming your teamId is unique this query will work. It simply identifies all players that have the exact same teams by summing the teamid or if the player has no ids it will be null. Then counts the number of matches over team matches. I tested using SQL fiddle in postgre 9.3.
SELECT
b.playerID
,b.playerName
FROM (
--Join the totals of teams to your player information and then count over the team matches.
SELECT
p.playerID
,p.playerName
,m.TeamMatches
,COUNT(*) OVER(PARTITION BY TeamMatches) as Matches
FROM player p
LEFT JOIN (
--Assuming your teamID is unique as it should be. If it is then a sum of the team ids for a player will give you each team they play for.
--If for some reason your team id is not unique then rank the table and join same as below.
SELECT
ps.playerName
,ps.playerID
,SUM(t.teamID) as TeamMatches
FROM plays p
LEFT JOIN team t ON p.teamID = p.teamID
LEFT JOIN player ps ON p.playerID = ps.playerID
GROUP BY
ps.playerName
,ps.playerID
) m ON p.playerID = m.playerID
) b
WHERE
b.Matches <> 1

This Query should solve it.
By doing a self join on PLAYS.
- Compare on the player Id
- Compare the matching row count with the total count for each player.
select p1.playerId, p2.playerId, count(p1.playerId)
from plays p1, plays p2
WHERE p1.playerId<p2.playerId
and p1.teamId = p2.teamId
GROUP BY p1.playerId, p2.playerId
having count(*) = (select count(*) from plays where playerid = p1.playerid)

Create function in SQl 2008
ALTER FUNCTION [dbo].[fngetTeamIDs] ( #PayerID int ) RETURNS varchar(101) AS Begin
declare #str varchar(1000)
SELECT #str= coalesce(#str + ', ', '') + CAST(a.TeamID AS varchar(100)) FROM (SELECT DISTINCT TeamID from Plays where PayerId=#PayerID) a
return #str
END
--select dbo.fngetTeamIDs(2)
Query start here
drop table #temp,#A,#B,#C,#D
(select PayerID,count(*) count
into #temp
from Plays
group by PayerID)
select *
into #A
from #temp as T
where T.count in (
select T1.count from #temp as T1
group by T1.count having count(T1.count)>1
)
select A.*,P.TeamID
into #B
from #A A inner join Plays P
on A.PayerID=P.PayerID
order by A.count
select B.PayerId,B.count,
(
select dbo.fngetTeamIDs(B.PayerId)
) as TeamIDs
into #C
from #B B
group by B.PayerId,B.count
select TeamIDs
into #D
from #c as C
group by C.TeamIDs
having count(C.TeamIDs)>1
select C.PayerId,P.PlayerName,D.TeamIDs
from #D D inner join #C C
on D.TeamIDs=C.TeamIDs
inner join Player P
on C.PayerID=P.PlayerID

Related

The difference between the minimum and maximum number of games

Question: Show the names of all players who have the following:
the difference between the minimum and maximum number of games
this players is greater than 5.
select p.name
from player p
join competition c
on c.playerID = p.playerID
where (
(select count(*) from competition
where count(games) > 1
group by playerID
) - (
select count(*) from competition
where count(games) <= 1
group by playerID
))> 5;
I'm kind of lost. I'm not so sure is this the right way, how I should proceed: should I use count and find the minimum and maximum number of games and compare with greater than 5 or should I use instead of count, min and max functions. Would be very grateful, if someone can explain me the logic of this.
Tables:
player competition
------- --------
playerID playerID
name games
birthday date
address
telefon
SELECT
P.Name,
MIN(C.Games) MinGame,
MAX(C.Games) MaxGame,
FROM Player P
INNER JOIN Competition C
ON C.PlayerId = P.PlayerId
GROUP BY P.Id, P.Name
HAVING MAX(C.Games) - MIN(C.Games) > 5
It should be a simple query:
With tab1 AS (Select player.name, min(games) mx_game, max(games) min_game,
max(games) - min(games) diff
from player JOIN competition ON player.player_id = competition.id
group by player.player_id, player.name)
Select tab1.name from tab1
WHERE diff >5;
I am adding player_id in the group by as player_name could be similar for 2 person.

Select a list of all game titles bought by those who have purchased a specified game

I want to select the list of games purchased by players who bought 'this_game'
Here is my base of 3 tables :
PLAYERS(IDP INT PK, PSEUDO VARCHAR)
GAMES(IDG INT PK, TITLE VARCHAR)
SALES(IDP INT FK, IDG INT FK)
I've tried something like that but it's not correct:
SELECT TITLE from SALES
JOIN GAMES on SALES.IDG = GAMES.IDG
JOIN PLAYERS on SALES.IDP = PLAYERS.IDP
GROUP BY TITLE
HAVING TITLE= 'this_game'
I've tried different things but none of them worked.
With a join on all 3 tables:
select distinct g.title
from games g
inner join sales s on s.idg = g.idg
inner join players p on p.idp = s.idp
where p.idp in (
select s.idp
from sales s inner join games g
on g.idg = s.idg
where g.title = 'this_game'
)
You do not need the players table to solve this -- nor a subquery, although that is not a problem:
select distinct g2.title
from games g1 join
sales s1
on g1.idg = s1.idg and
g1.title = 'this game' join
sales s2
on s1.idp = s2.idp join -- same person
games g2
on g2.idg = s2.idg;

SQL - Selecting highest scores for different categories

Lets say i've got a db with 3 tables:
Players (PK id_player, name...),
Tournaments (PK id_tournament, name...),
Game (PK id_turn, FK id_tournament, FK id_player and score)
Players participate in tournaments. Table called Game keeps track of each player's score for different tournaments)
I want to create a view that looks like this:
torunament_name Winner highest_score
Tournament_1 Jones 300
Tournament_2 White 250
I tried different aproaches but I'm fairly new to sql (and alsoto this forum)
I tried using union all clause like:
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '1' group by "Id_player" order by
"Score" desc) where rownum <= 1
union all
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '2' group by "Id_player" order by
"Score" desc) where rownum <= 1;
and ofc it works but whenever a tournament happens, i would have to manually add a select statement to this with Id_torunament = nextvalue
EDIT:
So lets say that player with id 1 scored 50 points in tournament a, player 2 scored 40 points, player 1 wins, so the table should show only player 1 as the winner (or if its possible 2or more players if its a tie) of this tournament. Next row shows the winner of second tournament. I dont think Im going to put multiple games for one player in the same tournament, but if i would, it would probably count avg from all his scores.
EDIT2:
Create table scripts:
create table players
(id_player numeric(5) constraint pk_id_player primary key, name
varchar2(50));
create table tournaments
(id_tournament numeric(5) constraint pk_id_tournament primary key,
name varchar2(50));
create table game
(id_game numeric(5) constraint pk_game primary key, id_player
numeric(5) constraint fk_id_player references players(id_player),
id_tournament numeric(5) constraint fk_id_tournament references
tournaments(id_tournament), score numeric(3));
RDBM screenshot
FINAL EDIT:
Ok, in case anyone is wondering I used Jorge Campos script, changed it a bit and it works. Thank you all for helping. Unfortunately I cannot upvote comments yet, so I can only thank by posting. Heres the final script:
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select g.id_tournament, g.id_player,
row_number() over (partition by t.name order by
score desc) as rd from game g join tournaments t on
g.id_tournament = t.id_tournament
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc;
This query could be simplified depending on the RDBMs you are using.
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select id_tournament,
id_player,
row_number() over (partition by t.name order by score desc) as rd
from game
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc
Assuming what you want as "Display high score of each player in each tournament"
your query would be like below in MS Sql server
select
t.name as tournament_name,
p.name as Winner,
Max(g.score) as [Highest_Score]
from Tournmanents t
Inner join Game g on t.id_tournament=g.id_tournament
inner join Players p on p.id_player=g.id_player
group by
g.id_tournament,
g.id_player,
t.name,
p.name
Please check this if this works for you
SELECT tournemntData.id_tournament ,
tournemntData.name ,
dbo.Players.name ,
tournemntData.Score
FROM dbo.Game
INNER JOIN ( SELECT dbo.Tournaments.id_tournament ,
dbo.Tournaments.name ,
MAX(dbo.Game.score) AS Score
FROM dbo.Game
INNER JOIN dbo.Tournaments ONTournaments.id_tournament = Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
GROUP BY dbo.Tournaments.id_tournament ,
dbo.Tournaments.name
) tournemntData ON tournemntData.id_tournament =Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
WHERE tournemntData.Score = dbo.Game.score

Complicated SQL Query involving multiple tables

In this Query, I have to list pair of players with their playerID and playerName who play for the exact same teams.If a player plays for 3 teams, the other has to play for exact same 3 teams. No less, no more. If two players currently do not play for any team, they should also be included. The query should return (playerID1, playername1, playerID2, playerName2) with no repetition such as if player 1 info comes before player 2, there should not be another tuple with player 2 info coming before player 1.
For example if player A plays for yankees and redsox, and player b plays for Yankees, Red Sox, and Dodgers I should not get them. They both have to play for Yankees, and Red Sox and no one else. Right now this query finds answer if players play for any same team.
player(playerID: integer, playerName: string)
team(teamID: integer, teamName: string, sport: string)
plays(playerID: integer, teamID: integer)
Right now the Query I have is
SELECT p1.playerID, p1.playerName, p2.playerID, p2.playerName
FROM player p1, player p2, plays
WHERE p1.teamID = p2.teamID AND teamID in.....
I am stuck on how to approach it after this. Any hints on how to approach this problem. Thanks for your time.
I think the easiest approach is to concatenate the teams together and just join on the results. Postgres provides the function string_agg() to aggregate strings:
select p1.playerId, p1.playerName, p2.playerId, p2.playerName
from (select p.playerId, string_agg(cast(p.TeamId as varchar(255)), ',' order by TeamId) as teams,
pp.PlayerName
from plays p join
players pp
on p.playerId = pp.playerId
group by p.playerId
) p1 join
(select p.playerId, string_agg(cast(p.TeamId as varchar(255)), ',' order by TeamId) as teams,
pp.PlayerName
from plays p join
players pp
on p.playerId = pp.playerId
group by p.playerId
) p2
on p1.playerid < p2.playerid and p1.teams = p2.teams;
EDIT:
You can do this without string_agg. The idea is to start with a list of all possible player combinations.
Then, join in the teams for the first player using left outer join. And join in the teams for the second by using full outer join and matching on the team and driver name. The reason you need the driver table is to be sure that the id/name does not get lost in the full outer join:
select driver.playerid1, driver.playerid2
from (select p1.playerId as playerId1, p1.playerName as playerName1,
p2.playerId as playerId2, p1.playerName as playerName2
from players p1 cross join
players p2
where p1.playerId < p2.playerId
) driver left outer join
plays p1
on p1.playerId = driver.playerId full outer join
plays p2
on p2.playerId = driver.playerId and
p2.teamid = p1.teamid
group by driver.playerid1, driver.playerid2
having count(p1.playerid) = count(*) and
count(p2.playerid) = count(*);
This joins two players on the team id (with ordering so a pair only gets considered once). It then says there is a match when all the rows for the two players have non-NULL team values. This is perhaps more clear with the equivalent having clause:
having sum(case when p1.playerid is null then 1 else 0 end) = 0 and
sum(case when p2.playerid is null then 1 else 0 end) = 0;
The full outer join will produce NULL values when two players have teams that don't match. So, no NULL values mean that all the teams match.
This is an adaptation of my answer to a previous question of yours.
Get all unique combinations of players using a triangular join:
SELECT p1.playerID, p1.playerName, p2.playerID, p2.playerName
FROM player p1
INNER JOIN player p2 ON p1.playerID < p2.playerID
Subtract the second player's team set from that of the first player and check if there are no rows in the result:
NOT EXISTS (
SELECT teamID
FROM plays
WHERE playerID = p1.playerID
EXCEPT
SELECT teamID
FROM plays
WHERE playerID = p2.playerID
)
Swap the sets, subtract and check again:
NOT EXISTS (
SELECT teamID
FROM plays
WHERE playerID = p2.playerID
EXCEPT
SELECT teamID
FROM plays
WHERE playerID = p1.playerID
)
Finally, apply both conditions to the result of the triangular join in Step 1.
SELECT p1.playerID, p1.playerName, p2.playerID, p2.playerName
FROM player p1
INNER JOIN player p2 ON p1.playerID < p2.playerID
WHERE
NOT EXISTS (
SELECT teamID
FROM plays
WHERE playerID = p1.playerID
EXCEPT
SELECT teamID
FROM plays
WHERE playerID = p2.playerID
)
AND
NOT EXISTS (
SELECT teamID
FROM plays
WHERE playerID = p2.playerID
EXCEPT
SELECT teamID
FROM plays
WHERE playerID = p1.playerID
)
;

Limiting recursion to certain level

I have a SQL table named Player and another called Team.
Each Player MUST belong to a team via a foreign key TeamID.
Each Team can belong to another Team via a recursive field ParentTeamID.
So it could be (top down)...
TeamA
TeamB
Team76
Group8
Player_ME
My question is, if I'm given a Player's PlayerID (the PK for that table), what is the best way to get the top Team?
My query so far (which gets all teams):
WITH TeamTree
AS (
SELECT ParentTeam.*, Player.PlayerID, 0 as Level
FROM Team ParentTeam
INNER JOIN Player ON Player.TeamID = ParentTeam.TeamID
WHERE Player.PlayerID IN (SELECT * FROM dbo.Split(#PlayerIDs,','))
UNION ALL
SELECT ChildTeam.*, TeamTree.PlayerID AS PlayerID, TeamTree.Level + 1
FROM Team ChildTeam
INNER JOIN TeamTree TeamTree
ON ChildTeam.TeamID = TeamTree.ParentTeamID
)
Now whilst I think this is the right place to start I think there may be a better way. Plus I'm kinda stuck! I tried using Level in a join (inside a subquery) but it didn't work.
Any ideas on how to work my way up the tree and get only the top level details?
Edit:
A ParentTeam CAN be a ParentTeam (infinite recursion), but a Player can only belong to one Team.
Data Structure
Team:
TeamID (PK), Name, ParentTeamID (Recursive field)
Player:
PlayerID (PK), Name, TeamID (FK)
Sample Data:
Team:
1, TeamA, NULL
2, TeamB, 1
3, Team76, 2
4, Group8, 3
Player:
1, Player_ME, 4
2, Player_TWO, 2
So with the above data, both players should show (in the query) that they have a "TopLevelTeam" of TeamA
I believe this is what you are looking for, with a bit of extra info thrown in for free :-)
Andrew had the correct idea in his edited version, but I think his implementation is incorrect.
The schema and query are available at SQL Fiddle
with teamCTE as (
select TeamID,
TeamName,
cast(null as int) as ParentTeamID,
cast(null as varchar(10)) as ParentTeamName,
TeamID TopTeamID,
TeamName TopTeamName,
1 as TeamLevel
from team
where ParentTeamID is null
union all
select t.TeamID,
t.TeamName,
c.TeamID,
c.TeamName,
c.TopTeamID,
c.TopTeamName,
TeamLevel+1 as TeamLevel
from team t
join teamCTE c
on t.ParentTeamID = c.TeamID
)
select p.PlayerID,
p.PlayerName,
t.*
from player p
join teamCTE t
on p.TeamID = t.TeamID
EDIT - answer to question in comment
You can navigate to any level within the player's team hierarchy simply by joining to the CTE a 2nd time. In your case you asked for the 2nd top most team: SQL Fiddle
with teamCTE as (
select TeamID,
TeamName,
cast(null as int) as ParentTeamID,
cast(null as varchar(10)) as ParentTeamName,
TeamID TopTeamID,
TeamName TopTeamName,
1 as TeamLevel
from team
where ParentTeamID is null
union all
select t.TeamID,
t.TeamName,
c.TeamID,
c.TeamName,
c.TopTeamID,
c.TopTeamName,
TeamLevel+1 as TeamLevel
from team t
join teamCTE c
on t.ParentTeamID = c.TeamID
)
select p.PlayerID,
p.PlayerName,
t1.*,
t2.TeamID Level2TeamID,
t2.TeamName Level2TeamName
from player p
join teamCTE t1
on p.TeamID = t1.TeamID
join teamCTE t2
on t1.TopTeamID = t2.TopTeamID
and t2.TeamLevel=2
WITH TeamTree
AS (
SELECT ParentTeam.*, Player.PlayerID AS UrPlayerID, 0 as Level
FROM Team ParentTeam
INNER JOIN Player ON Player.TeamID = ParentTeam.TeamID
WHERE Player.PlayerID IN (SELECT * FROM dbo.Split(#PlayerIDs,','))
UNION ALL
SELECT ChildTeam.*, TeamTree.PlayerID AS PlayerID, TeamTree.Level + 1
FROM Team ChildTeam
INNER JOIN TeamTree TeamTree
ON ChildTeam.ParentTeamID = TeamTree.TeamID /* These were reversed, I think */
AND UrPlayerID=ChildTeam.PlayerID /* ADDED */
)
Otherwise you get a huge duplication of rows, something like the square of the number of players, don't you?
--
(After comment below)
Quite right, I misread the schema. Look, you don't need to bring the player in until the very end. I thought the team tree arrangement might differ by player, but it doesn't. So
WITH recursive TeamTree AS (
SELECT TeamID, ParentTeamID FROM Team T1
UNION ALL
SELECT T1.TeamID, T2.ParentTeamID FROM T1 JOIN T2 ON T1.ParentTeamID=T2.TeamID
)
SELECT TeamTree.* FROM TeamTree JOIN Team T3
ON TeamTree.ParentTeamID=T3.TeamID WHERE T3.ParentTeamID IS NULL;
This gives you a table of each team and its root ancestor. Now join that to the player table.
SELECT * FROM Player JOIN (WITH TeamTree AS (
SELECT TeamID, ParentTeamID FROM Team T1
UNION ALL
SELECT T1.TeamID, T2.ParentTeamID FROM T1 JOIN T2 ON T1.ParentTeamID=T2.TeamID
)
SELECT TeamTree.* FROM TeamTree JOIN Team T3
ON TeamTree.ParentTeamID=T3.TeamID WHERE T3.ParentTeamID IS NULL) teamtree2
ON Player.TeamID=teamtree2.TeamID;
You can rejoin with Team if you need more columns.