Schema Normalization :: Composite Game Schedule Constrained by Team - schema

Related to the original generalized version of the problem:http://stackoverflow.com/questions/6068635/database-design-normalization-in-2-participant-event-join-table-or-2-column
As you'll see in the above thread, a game (event) is defined as exactly 2 teams (participants) playing each other on a given date (no teams play each other more than once in a day).
In our case we decided to go with a single composite schedule table with gameID PK, 2 columns for the teams (call them team1 & team2) and game date, time & location columns. Additionally, since two teams + date must be unique, we define a unique key on these combined fields. Separately we have a teams table with teamID PK related to schedule table columns team1 & team2 via FK.
This model works fine for us, but what I did not post in above thread is the relationship between scheduled games and results, as well as handling each team's "version" of the scheduled game (i.e. any notes team1 or team2 want to include, like, "this is a scrimmage against a non-divisional opponent and will not count in the league standings").
Our current table model is:
Teams > Composite Schedule > Results > Stats (tables for scoring & defense)
Teams > Players
Teams > Team Schedule*
*hack to handle notes issue and allow for TBD/TBA games where opponent, date, and/or location may not be known at time of schedule submission.
I can't help but think we can consolidate this model. For example, is there really a need for a separate results table? Couldn't the composite schedule be BOTH the schedule and the game result? This is where a join table could come into play.
Join table would effectively be a gameID generator consisting of:
gameID (PK)
gameDate
gameTime
location
Then revised composite schedule/results would be:
id (PK)
teamID (FK to teams table)
gameID (FK to join table)
gameType (scrimmage, tournament, playoff)
score (i.e. number of goals)
penalties
powerplays
outcome (win-loss-tie)
notes (team's version of the game)
Thoughts appreciated, has been tricky trying to drilldown to the central issue (thus original question above)

I don't see any reason to have separate tables for the schedule and results. However, I would move "gameType" to the Games table, otherwise you're storing the same value twice. I'd also consider adding the teamIDs to the Games table. This will serve two purposes: it will allow you to easily distinguish between home and away teams and it will make writing a query that returns both teams' data on the same row significantly easier.
Games
gameID (PK)
gameDate
gameTime
homeTeamID
awayTeamID
location
gameType (scrimmage, tournament, playoff)
Sides
id (PK)
TeamID (FK to teams table)
gameID (FK to games table)
score
penalties
powerplays
notes
As shown, I would also leave out the "Outcome" field. That can be effectively and efficiently derived from the "Score" columns.

Related

What if your fact has multiple instances of the dimension?

In a star schema for a clothes shop, there is a transaction fact to capture everything bought. This will usually have the usual date, time, amount dimensions but it will also have a person to indicate who bought it. In certain cases you can have multiple people on the same transaction. How is that modelled if the foreign key is on the Fact table and hence can only point to one Person?
The standard technique in dimensional modelling is to use a 'Bridge' table.
The classic examples you'll find are groups of customers having accounts (or transactions), and for patients having multiple diagnoses when visiting a hospital.
In your case this might look like:
Table: FactTransaction
PersonGroupKey
Other FactTableColumns
Table: BridgePersonGroup
PersonGroupKey
PersonKey
Table: DimPerson
PersonKey
Other person columns
For each group of people you'd create a new PersonGroupKey and end up with rows like this:
PersonGroupKey 1, PersonKey 5
PersonGroupKey 1, PersonKey 3
PersonGroupKey 2, PersonKey 1
PersonGroupKey 3, PersonKey 6
PersonGroupKey then represents the group of people in the Fact.
Logically speaking, there should be a further table, DimPersonGroup, which just lists the PersonGroupKeys, but most databases don't require this so typically Kimball modellers do away with it.
That's the basics of the Bridge table, but you might consider modifications depending on your situation!
You need a joining table TransactionPerson (or something like that), where Person to TransactionPerson is 1:M relationship and then TransactionPerson to Transaction is M:1 relationship.
That way you can have multiple people relating to one transaction indirectly.
I would propose to use a Bridge table in combination with your transaction and person tables. Ex:
Table: fact_transaction
transaction_id (primary key)
transaction_person_id (foreign key)
...
Table: bridge_transaction_person
transaction_person_id
person_id
Table: dim_person
person_id (primary key)
...

Writing SQL query to find ranking

I'm trying to determine for a given person how many people have a better score than they do, and group it by the different teams they belong to. So, in the tables below, I'm grabbing the list of team_id from the team_person table where the person_id matches the person I care about. That will get me all of the teams I belong to.
Then I need to know each person_id that is in any team I belong to so that I can find out what their maximum score is from the performances table.
Once I have that, I finally want to determine, for each team_id, how many people on that team have a better score than I do, where better is simply defined as having a larger value.
I've gotten way beyond my abilities with SQL at this point. What I have so far, which seems to get me the maximum score for all the people I care about, (basically everything but my final "by team" requirement) is this:
SELECT person_id, MAX(score) m
FROM performances
WHERE category_id = 7 AND person_id IN (
-- Find all the people on the teams I belong to
SELECT DISTINCT person_id
FROM team_person
WHERE team_id IN (
-- Find all the teams that I belong to
SELECT DISTINCT team_id
FROM team_person
WHERE person_id = 2
)
)
GROUP BY person_id
ORDER BY 2 DESC
My two relevant tables are defined like so, and I'm using psql 9.1.15
Table "public.team_person"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
ident | integer | not null default nextval('team_person_ident_seq'::regclass)
team_id | integer | not null
person_id | integer | not null
*chop extraneous columns*
Indexes:
"team_person_pkey" PRIMARY KEY, btree (ident)
"teamPersonUnique" UNIQUE CONSTRAINT, btree (team_id, person_id)
Foreign-key constraints:
"team_person_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(ident) ON DELETE CASCADE
"team_person_team_id_fkey" FOREIGN KEY (team_id) REFERENCES team(ident) ON DELETE CASCADE
Referenced by:
TABLE "roster" CONSTRAINT "roster_team_person_id_fkey" FOREIGN KEY (team_person_id) REFERENCES team_person(ident) ON DELETE SET NULL
Triggers:
update_team_person_modified BEFORE INSERT OR UPDATE ON team_person FOR EACH ROW EXECUTE PROCEDURE update_modified_column()
Table "public.performances"
Column | Type | Modifiers
-------------+--------------------------+--------------------------------------------------------------
ident | bigint | not null default nextval('performances_ident_seq'::regclass)
category_id | integer | not null
person_id | integer | not null
score | real | not null
*chop extraneous columns*
Indexes:
"performances_pkey" PRIMARY KEY, btree (ident)
Foreign-key constraints:
"performances_category_id_fkey" FOREIGN KEY (category_id) REFERENCES performance_categories(ident) ON DELETE CASCADE
"performances_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(ident) ON DELETE CASCADE
First, state just the problem, without assumptions about how to get to the solution. You've done that fairly well:
determine for a given person how many people have a better score than they do, and group it by the different teams they belong to.
but I'd rephrase a bit:
For each team a given person is a member of, how many people in that team have a better score than the subject person?
I don't know about you, but it suddenly seems simpler now. Take the team table, left outer join team_person and filter for teams we're a member of, left outer join performances to find games we played with that team, left outer join team_person again to get other people who're members of each team, left outer join performances, filter out teams the subject person isn't a member of, group and aggregate.
It's underspecified for some corner cases (like a team where you're the only member, or a team where you didn't play a game), but eh, whatever.
Problems:
There's no team table. Since you don't care about anything in the team table, you can omit it from the join and just use team_person as the join root.
Your team_person table is defective, by the way. It should have a UNIQUE constraint on (team_id, person_id). Or, better, that should be the primary key. It doesn't actually matter for this query because duplicate team memberships won't change the result, but it's bad data modelling. You can't be a member of a team more than once.
performances should also have a column identifying the particular game or whatever. Since you haven't shown one, I'm going to assume you mean that you're looking for people who, in any game, performed better than the subject person at least once, in that game or another game. If you actually want to find people who did better in a particular game then you need a suitable key on performances.
Fatal problem: performances is also missing a column linking the performance to the team. This makes it impossible to properly solve the problem because you can't get performances by a given person on a given team. I'm going to assume there is in fact a team_id on performances and you just left it out.
So, allowing for the above issues, I'd first acquire the data with a big join, then group and aggregate it. This join will give us, for each team we played in, for each of our performances, for each other player, for each of their other performances, one row with all the relevant information. You can then compare performances and aggregate.
The below is totally untested, since you didn't provide sample data and you chopped important parts out of your schema (or the schema is defective), but I'd try something like:
SELECT
my_performances.team_id,
-- Find how many distinct people scored better than us at least once,
-- no matter how many times or in which game.
COUNT(distinct other_team_person.person_id)
-- Start the join with our team memberships and how we scored in each.
-- If we didn't play any games for this team don't produce a result row
-- for it, so INNER JOIN.
FROM team_person my_team_person
INNER JOIN performances my_performances ON
(my_performances.person_id = my_team_person.person_id
AND my_performances.team_id = my_team_person.team_id)
-- Other members of teams we're also a member of, skipping
-- ourselves. An `INNER JOIN` is fine here because we know
-- a team with only ourselves as a member isn't interesting
-- and we might as well skip it.
INNER JOIN team_person others_team_person ON (
my_team_person.team_id = other_team_person.team_id
AND my_team_person.person_id <> other_team_person.person_id)
-- How each of those people performed in each team they're in
-- (because of previous filter, only considers teams we're in too).
-- INNER JOIN because if they never played they can't beat us.
INNER JOIN performances other_performances ON (
other_team_person.person_id = other_performances.person_id
AND other_team_person.team_id = other_performances.team_id)
-- Make sure `my_team_person` is only teams we're a member of
WHERE my_team_person.person_id = $1
-- Also discard rows where the other person didn't do better than us
AND my_performances.score < other_performances.score
-- Emit one row per team we're a member of
GROUP BY my_performances.team_id;
If you want to show teams where you never played and teams where you're the only player, you'll need to change some INNER JOINs to LEFT OUTER JOINs.
If you want to compare to find people who beat you only within a given game, you're going to need an extra column on performances, then an extra term in the join on other_performances to restrict it to only matching in the same game as my_performances.

Confusion over relationship in a tennis tournament context

I'm making a simple tennis database that features:
Tournaments (Mens/womens singles, Mens/womens doubles, Mixed doubles)
Players
Results
My Results table looked something like
ResultID
DatePlayed
Player1ID (became ParticipantID1)
Player1Score
Player2ID (became ParticipantID2)
Player2Score
I then realised that a player could play in more than one tournament so I needed another table, Participant (or could be called team, or couple).
ParticipantID
PlayerID
TournamentID
With all three ID's being composite.
Now Joe (ID: 1) and Tom (ID: 2) can be a doubles partner but Joe can still play in the Singles
Participant ID -- PlayerID -- TournamentID
1 ----------------- 1 ------------ 1
1 ----------------- 2 ------------ 1
2 ----------------- 1 ------------ 2
Now the problem I have with this is that auto-increment/identify would have to be turned for the Participant table meaning more coding work would have to be done (when creating and validating etc).
The second problem I have with this is that all the singles players data would be repeating, as they would be stored in the Player table and also the Participant table.
I could have another Results table, i.e. DoubleResults, but I feel this isn't necessary.
Is this the only way it can be done or has my mind just gone blank?
A tournament will have many matches. Therefore you need a one to many relationship between tournaments and matches.
A match will have a type (singles, doubles, etc). So your match table has to include a type id.
You will need a many to many relationship between teams and matches. A team can have either one or two players, depending on the type of match. At this point, it comes down to how much detail you want to store. It might be simply good enough to store the winning team id in this table.

Database schema for two-player games

Let's have a game played by two players (or teams), where the outcome is represented by a score of both players (e.g. football). Is there an idiomatic way of storing results of such a game in a relational database?
I came up with two possible schemas, none of which seems to be universal enough.
1. One row per game
There will a table games with columns game_id, winner, loser, winner_points, loser_points and each game will be stored in one row of the table.
This representation is great when one needs to iterate through all games.
Computing statistics for a player is difficult (e.g. computing average number of points for a player)
2. Two rows per game
There will a games table with columns game_id, player, opponent, player_points, opponent_points. Each game will be stored in two rows of the table and they will have the same game_id.
Iterating through all games is not trivial, but still easy
Computing average points for a players is simple SELECT AVG(player_points) FROM games WHERE player = some_player
Unfotunately the data in the table are now redundant
I would suggest a better relational model with a games table, a player table and a third table to handle many-to-many relationship of players to games. Something like:
game( game_id, date, description, .... )
player( player_id, name, .... )
player_game( player_id, game_id, score )
Now you have a normalized (not redundant, etc.) schema that is very flexible.
To find he winner of a game you can:
select max(score), player_id from player_game where game_id = 'somegame'
To find total points for a player you can:
select sum(score) from player_game where player_id = 'someplayer'
And so on...
I would definitely do one row per score (i.e. 2 rows per game). I would store the player, and points, and if I wanted the opponent player/points, I'd look that up separately. I would almost certainly have a separate table for each game, which would mention things like when it was played, etc
No redundant data, everything is easy to gather etc.

SQL Database Design Many to Many

I am creating a database based on a sporting game to store matches and each player involved in each match. I am having trouble resolving a many to many relationship. I currently have the following tables:
Player
id
name
Match
id
date
PlayerMatch
player_id
match_id
home_team
goals_for
goals_against
There will always be a minimum of two players in a match. Is this the best design for this approach?
I would recommend a sticking with a many to many relationship. This allows you to change the specifications of how many players you can have in a game easily while not complicating the data model much.
Player
id
name
Match
id
date
PlayerMatch
player_id
match_id
is_home
goals_for
goals_against
Foreign key from PlayerMatch to Player
Foreign key from PlayerMatch to Match
--All the matches a player has played in.
SELECT m.*
FROM Player p
JOIN PlayerMatch pm
ON p.id = pm.player_id
JOIN Match m
ON m.id = pm.match_id
WHERE p.id = /*your player Id*/
--All the players in a match
SELECT p.*
FROM Match m
JOIN PlayerMatch pm
ON m.id = pm.match_id
JOIN Player p
ON p.id = pm.player_id
WHERE m.id = /*your match Id*/
--player information for a single match.
SELECT pm.*
FROM Player p
JOIN PlayerMatch pm
ON p.id = pm.player_id
JOIN Match m
ON m.id = pm.match_id
WHERE p.id = /*your player Id*/
AND m.id = /*your match Id*/
That is a valid option, though I would suggest a naming convention where you use the same column name in both tables (i.e. use match_id in both Match and PlayerMatch; same for player_id). This helps make your SQL a bit more clear and when doing joins in some databases (MySQL) you can then use the 'using (col1, col2, ...)' syntax for the joins.
I wouldn't use the many-to-many relationship, I would do like this:
Player
id
name
Match
id
home_player_id
guest_player_id
date
goals_home_player
goals_guest_player
I think I'd try to model the match first & then see what happens with the table design :
Match
-------------
match_Id
player1_Id
player2_Id
player1_Goals
player2_Goals
Where player1_Id and Player2_Id are both foreign keys onto the Player table
Player
---------
Id
Name
By convention player1 would always be the home team
then you would query it like
Select p1.name as player1_Home, p2.name as player2_away,
matchId,
player1_Goals as homeGoals, player2_Goals as awayGoals
from Match m
inner join Player p1 on p1.id = m.Player1_Id
inner join Player p2 on p2.id = m.Player2_Id
This sort of data relationship is not at all unnatural. To set it up, just ask yourself two questions:
Do players have more than one match?
Do matches have more than one player?
If the answer is yes to both, then you have a many-to-many relationship and these are not at all uncommon. Their implementation is only slightly more complicated. In a one-to-many relationship, you'd hold a foreign key to a list of records in some table. As it happens, this is still how it works in many-to-many relationships, except that both the Players and the Matches table will need a foreign key to some list of records.
This list is called the Bridge Table. So you'll need to use a total of three tables to descrive the relationship
Players
-------
player_id
<player attribute columns, eg last_name, first_name, goals_scored, etc.>
Player_Match
------------
player_id
match_id
Matches
-------
match_id
<a list of columns that are match attributes, eg. match date, etc.>
The table in the middle of the diagram above is called a bridge table, and it does nothing more than map players to matches, and it also maps matches to a list of players. Often, bridge tables have only 2 columns, each representing a foreign key to one of the bridged tables. There is no need for a primary key in a bridge table, and if there is not one, it means that a player can have more than one of the same match. If a player can have only one of one kind of match, then make the primary key for each row of the bridge table a composite key on both of the columns.
In database design, normalization is a highly desirable relational goal because it provides a database with the greatest possible flexibility and the lowest amount of redundancy. To normalize, ask yourself if the data you want to put in a table is -really- an actual attribute of the object described by the primary key. For example, is the home_team an actual attribute of the match. I would say no, it is not. In this case, you should replace home_team in your PlayerMatch table with a foreign key to a Teams table. In your Matches table, you ought to have two columns. One for a home team foreign key, and one for the away team key. The teams are not actual attributes of a match and so to normalize the Match table, you'd want to put those data in tables of their own.
Agree with M Hagopian, the op schema looks like a good start.