How to structure tables for television tracking? - sql

I am working on an app that tracks which tv shows you are watching and which episodes from that show you have watched.
Currently I have can track which users are watching which shows, this is relatively simple and it is structured like so:
table structure
So I have a User table which has a primary key id that identifies that user and some other info about that user which is not relevant.
Following this I have another table TV_SHOWS which stores a user_id as a foreign key some information about the show and crucially a field called user_watching that gives a boolean value as to whether the user is watching that show.
How it works is this, when the user clicks a button to add the show a row is inserted into the TV_SHOWS table with the all the information and user_watching = true at some point the user may wish to stop watching so they can click another button that will simply update the user_watching = false.
Now I wish to expand on this idea so that users can track which seasons and episodes of a tv show they have watched, a tv show is comprised many seasons and each season is comprised of many episodes in most cases.
I would like some help with how best to structure this database schema.
I initially thought that I could just expand along the same lines I had previously for instance:
I simply add two id's for the season and episode, in that way I can tell if a user has watched a season by seeing if they have watched all the episodes in that season and if they have watched a show by checking if they have watched all the seasons in the show, but I don't think this seems like good practice.
I could try and do a one table for shows, another for seasons and a final for episodes but I am not sure where I would "track" is a user had watched one of those, would it have to be in that specific table or would I need another table to track it?

Short answer: I think the latter design is a better option from a scalable design perspective.
Longer Answer:
The latter option with show <--> season <--> episodes separated into individual tables is closer to a textbook design. Third Normal Form, I think. The idea being that you want to reduce redundant data as much as possible, as to minimize storage space. Minimizing redundant data is a general best practice. By separating the three you are able to keep attributes that are specific to the each topic from being replicated.
For example, let's assume you track TV Show attributes like title, description or an average ranking. If you maintain a single table with show/season/episode, then the TV Show attributes should technically be replicated (i.e. redundant) for every season and every episode in order to maintain a table constraints. Conversely, separating out the TV show attributes avoids the redundant attribute data. It also helps to allow the attributes set to grow/change overtime.
This design practice though comes with a computational cost as you need to JOIN the structures together depending upon the use cases. For example, if you always need to know show title, season X, and episode title (attributes from each of the tables), then you'll need to develop the necessary join logic to obtain those attributes. This cost can be mitigated using a View, which opens another set of decisions.
To your comment on the need for two keys. Following the three table design, you'll likely want a primary key of some sort on each table. If you structure the episode table with reference key to the season (and maybe the show), then you should only need to use the episode key as you can de-reference the season and TV show from the episode using join logic. In effect, the episode should be enough to figure out the season and TV show from the linking.

-- User USR exists.
--
user {USR}
PK {USR}
-- TV show SHW exists.
--
show {SHW}
PK {SHW}
-- Season number SEA# of show SHW exists.
--
season {SHW, SEA#}
PK {SHW, SEA#}
FK {SHW} REFERENCES show {SHW}
-- Episode number EPI# of season number SEA#
-- of show SHW, in duration of DUR_E minutes, exists.
--
episode {SHW, SEA#, EPI#, DUR_E}
PK {SHW, SEA#, EPI#}
FK {SHW, SEA#} REFERENCES season {SHW, SEA#}
-- User USR watched DUR_W minutes of episode number EPI#
-- of season number SEA# of show SHW.
--
user_show {USR, SHW, SEA#, EPI#, DUR_W}
PK {USR, SHW, SEA#, EPI#}
FK1 {SHW, SEA#, EPI#} REFERENCES
episode {SHW, SEA#, EPI#}
FK2 {USR} REFERENCES user {USR}
Note:
All attributes (columns) NOT NULL
PK = Primary Key
FK = Foreign Key
Using suffix # to save on screen space.
OK for SQL Server and Oracle, for others use _NO.
For example, rename EPI# to EPI_NO.

Related

Create simple database for chess tournaments

I am trying to make simple app for chess tournaments, but I have problem with database, I have users that participate in tournament (thats fine) but how do I give users to the round and match, should i make another relations user_tournament-round-tournament, user_tournament-match-round?
Please see this answers a food for though rather than a solution. In your question there is not enough information to fully cover all use cases, so the answer below contains a lot of speculation.
In my over simplistic view and picking up on your initial model, the tournament_competitors (renaming from user_tournament as we have competitors and not users) table would create a unique id for each enrolled competitor. This id would be used as a reference in a tournament_matches table (the table would link twice to the tournament_competitors this table would connect two opponents - constraint warning). The table would also register the match type.
For the match type, I see two possibilities.
The matches table would list all possible match types (final, semi-final, quarter-final, elimination rounds, etc.) and these would be referred to in the tournament_matches table via id (composite key in the form tournament_id-competitor_id-group_id). This approach, specially for the elimination round matches, requires the need to find a way to link the number of competitors in each elimination group with then number of matches each competitor has to through before they are considered eliminated or not - creating a round number. I see this as a business logic part so not on the DB. The group_id also needs to be calculated and it would be best done on the application.
The alternative is to have the various match types in the tournament_matches table as a free field - populated by the application. The tournament structure (Number of Groups, number of opponents in each group, etc.) would be defined as attributes in the tournaments table. In this view there is no need for the rounds table.

Is there anything fundamentally wrong with my database relations table?

Would this be the correct layout for a diagram as such? A few of these tables share the same primary key, but I am not sure if this is the best practise/correct relationships that I should set out.
It's for a local level, whereby players don't change teams and assuming that player positions are final. The aim is to gather statistics to show later for analysis.
The Squad table should be a linking table that creates a many to many relationship between Players and Team. Since each Player/Team combination can occur only once, both columns Team_ID and Player_ID should be part of the primary key.
Squad should be on the n-side of two relationships. Its name should probably be something like Membership.
Why do you need a separate PlayerStatistics table? Apparently it stores statistics for the same Player_ID/Team_ID combinations as Squad. The fields of this table should go to the Squad table.
Shouldn't the Positions be per membership? One position per membership, i.e. one player has one defined position in each team, in which case Position_ID should be a column in Squad.
There should be two relationships between Team and MatchStatistics. One on Home_team_ID and one on Away_team_ID.
Alternatively you could associate the PlayerStatistics to Player and Match and thus store what each player has done in each single game. You would then retrieve the overall player statistics or the player-per-team statistics through appropriate queries.
Messed up in my mind unless you have some strange requirements.
With this design a squad is limited to a single player.
Team_ID is associated with Statistics (not player). If you want a player associated with a single team then do that in Players. And then you should actually merge Statistics with Players.
A link on PK to PK between two tables is rarely a proper design.
If you want a player to be able to play on multiple teams then have PlayerID, TeamID a composite key in Statistics.
You need to disclose the requirements for a proper review. Squad is clearly messed up but you have not stated the purpose of squad.

Relational database tables

I'm currently working on an ASP.Net MVC project for a software engineering class. My goal is to create a small online game rental system. I currently have a 3 tables, Movies, Games and Registrants; and I'm using LINQ-to-SQL to define each of these tables as classes for my model. So far I've created models for Movies and Games, what I would like to do when creating the Registrant model is create a relationship between Registrants and Movies and Games. What I've tried so far is to define a foreign key between the ID (the primary key in the Registrant table) and a registrantID field in both the Movies and Games. What I realized is that if I were to remove an instance of a registrant, it will delete the associated movie and/or game from the other tables. What I'm thinking of doing is creating two separate models defining rentedGames and rentedMovies and creating a relationship between those and the Games and Movies table in order to try and model a registrant renting/returning/buying movies or games from the store.
In Summary:
What I have so far:
3 tables: Registrants, Movies and
Games.
LINQ-to-SQL models for my
inventory of movies and games.
What I'm trying to setup:
A model for a registrant renting/returning a movie and/or game, when a game is rented/returned, a flag is placed next to the item in the inventory to indicate its status.
Question:
Will adding separate tables to model
a rented movie/game prevent items
defined in my inventory models from
being deleted?? i.e. when a customer returns a rented movie, the rentedMovie instance is deleted, but not the movie is is referring to in the movie inventory.
Is there such a thing as a related
table having a status flag set on the
related entry, as opposed to the
entry being deleted, whenever the
associated entry in the other table
is modified?? i.e. when a customer returns a rented movie, the rentedMovie instance sets a flag in the movie it refers to that it's available for rent, the rentedMovie instance is then deleted.
I'd go about this a bit differently. First, is there a real reason to treat a Movie and a Game as separate entities? Why not have a RentableItem that can be either a movie, a game, a game machine, a Blue-Ray player, or whatever? You'd key it by an item_id field, and it would have the expected metadata (title, type, genre, rental_class, and so on).
Then you need to model the fact that a Registrant rents one or more RentableItems. This can be done with a Rental table, whose rows each connect one rented RentableItem with a particular Registrant (that is, the Rental is keyed by a rental_id and it has a foreign key to RentableItem.item_id and a foreign key to Registrant.registrant_id. The Rental would also have the due date, a "returned" flag, the price of the rental, etc.
Then you know a RentableItem is not in the store if there is a Rental record whose item_id is the same as the RentableItem's and whose "returned" flag is false. You never have to modify the RentableItem table itself, just the Rental table.
You're right to create separate tables for rentedGames and rentedMovies, since this model now allows for more than one movie or game of the same type being rented at the same time, which is surely more realistic than having only one instance of a particular movie or game.
This will prevent the deletion of the parent record, when the link record (rentedMovie, say) is deleted. But this deletion of the parent movie should not be happening anyway if you've set up your relationship to not 'cascade delete', and you allowed the registrantID field in the original Movies or Games tables to be nullable.
To answer your second question (which I realise assumes only one movie/game for any particlar title): the way this is normally done, if you're using link tables, which is what you want to do, is simply to delete the rentedMovie/Game record. The absence of a link record for any Movie or Game is all your code needs to determine in order to know that that movie or game is now rentable (again).
I know you're doing this for a class / practice, so this may not be relevant, but consider that having the rental history for things is often very useful. Because of this, you may not want to delete the rented records, but instead just mark the item as returned.
Consider:
TABLE RentalTransaction:
RentalTransactionID integer PK NOT NULL
CustomerID integer FK NOT NULL
RentedOn datetime NOT NULL
DueDate datetime NOT NULL
<..any other fields you may need..>
TABLE RentalItems:
RentedID integer PK NOT NULL
RentalTransactionID integer FK NOT NULL
RentedItemID integer FK NOT NULL
RentedQty integer NOT NULL
RentalRetuned datetime NULL
You can see if any individual item is out or not by if it's RentalReturned field is null or not. If it is nonull, then you know the item is back, and now you can aggregate rental data to see how often it goes out, what the average length of rental is, etc, etc. You would have to build in some checks to make sure you weren't renting more copies of an item than you actually have and other such things, but I think this is overall a more flexible start to a schema. It may also be overly complicated for what you're doing, but I wanted to at least bring the idea up.
Do you really want to delete the rentedMovie instance? How will you report on how many movies a person has rented etc?
I'd suggest rethinking your model slightly. You need somewhere to store people data, somewhere to store item data and somewhere to store people/item data as a first step.
Ignore the difference between movies and games for now - that becomes a process of normalisation once you've defined your underlying structure.
As a simple starting point you should have:
Persons 1..1 ---- 1..* Hires 0..* ---- 1..1 Items
where the Hires table is a linking table between the two others with a combined key made up of personID, ItemID and a time-stamp of some description (to allow re-renting of the same movie).
You can then look at having a separate table for item types etc.
First thing to consider is that a movie is actually two entities, title and media. Title is "Lord of the Rings", while media is a DVD you take home. One title can have many media (copies), while one media has one title. Rental table has a row for each media-rental, this table gets a new row each time a bar code is scanned on rental, while DateReturned is populated upon return. Status field in the Media table tracks the in/out status for each disc/game.
If you feel that you need to track which movies were rented together to a customer, you may find that by DateRented (datetime) or add a ReceiptNumber or ShoppingBasketID to the Rental table.

How do you resolve a many-to-many collection entity in a RDBMS?

I'm trying to model artists and songs and I have a problem where I have a Song_Performance can be performed by many artists (say a duet) so I have an Artist_Group to represent who the songs is performed by.
Well, I now have a many-to-many relationship between Artist and Artist_Group, where an Artist_Group is uniquely identified by the collection of artists in that group. I can create an intersection entity that represents an Artist's participation in an Artist_Group (Artist_Group_Participation?)
I'm having trouble coming up with how to come up with a primary key for the Artist_Group entity that preserves the fact that the same set of artists represents the same group, and lacking a primary key for the Artist_Group entity means I'm lacking a foreign key for the Artist_Group_Participation entity.
The book "Mastering Data Modeling" by John Carlis and Joseph Maguire mention this shape and refer it to as a "Many-Many Collection Entity" and state that it is very rare, but doesn't state how to resolve it since obviously a many-to-many relationship can't be stored directly in a RDBMS. How do I go about representing this?
Edit:
Looks like everyone is suggesting an intersection table, but that's not my issue here. I have that. My issue is enforcing the constraint that you cannot add an Artist_Group entry where the group of artists that it contains are the same as an existing group, ignoring order. I thought about having the ID for Artist_Group be a varchar that is the concatenation of the various artists that comprise it, which would solve the issue if order mattered, but having an Artist_Group for "Elton John and Billy Joel" doesn't prevent the addition of a group for "Billy Joel and Elton John".
I guess I'm missing the point of the "Artist_Group" relation.
The data model in my mind is:
Artist: an individual person.
Song: The song itself.
Performance: A particular performance or arrangement of a song. Usually this would have one song, but you could provide an m:n linking table to accommodate a medley. Ideally, this would be a single real performance, i.e., there would be an associated date.
Recording: A particular fixed version of a performance (CD or whatever). Usually a Performance only has one Recording, but having a separate table would handle the Grateful Dead / multiple-bootleg scenario, as well as re-release albums, radio play vs. live vs. CD versions, etc.
Performance_Artists: A linking table from a particular performance to a list of performers. For each, you could also have an attribute that describes their role(s) in the performance (vocalist, drummer, etc.).
There's no explicit relationship between a set of performers, except that they share performances in common. Thus, any table that attempts to combine random sets of artists outside the context of a recording is not an accurate relational model, as there is no real relationship.
If you are trying to represent an explicit relationship between a set of artists (i.e., they are in the same band), well, bands have names that have uniqueness (though not enough to be a primary key), and a band could be stored simply as an Artist, and then have an Artist_Member linking table that is self-referencing back to the individual Artist records. Or you could have a separate Band table, and a Band_Members table to assign artists to it, perhaps with dates of membership. Either way, just remember that band members change over time and band roles change from one song to the next, so associating a band with a performance should not substitute for linking performances directly to the artists involved.
The primary key for both the Artist and Artist_Group would be an numeric, incremental ID. Then you'd have an Artist_Group_Participation table that has two columns: artist_id and group_id. These would be foreign keys that refer to the ID of their respective tables. Then to SELECT everything you'd use a JOIN.
EDIT: Sorry, I misunderstood your question. The only other way I can think of is add an "artists" column to your Artist_Group table that contains a serialized array (assuming you're using PHP, but other languages have equivalents) of the artists and their IDs. Then just add a UNIQUE constraint to the column.
You could make each artist's ID correspond to a bit in a bitfield. So if Elton John is ID 12 and Billy Joel is ID 123, then the "group" formed by a duet between Elton John and Billy Joel is Artist_Group ID 10633823966279326983230456482242760704 (i.e. it has the 12th and 123rd bit set).
You could enforce the relationship using the intersection table. For example, using a CHECK constraint in PostgreSQL:
CREATE TABLE Artist_Group_Participation (
artist_id int not null,
artist_group_id int not null,
PRIMARY KEY (artist_id, artist_group_id),
FOREIGN KEY (artist_id) REFERENCES Artists (artist_id),
FOREIGN KEY (artist_group_id) REFERENCES Artist_Group (artist_group_id),
CHECK (B'1'<<artist_id & artist_group_id <> 0)
);
Admittedly, this is a hack. It applies extra significance to the Artist_Group surrogate key, when surrogate keys are supposed to be unique but not contain information.
Also if you have thousands of artists, and new artists every day, things could get unwieldy because the length of the Artist_Group key's data type needs to grow larger all the time.
I guess you could build a primary key by sorting and concatenate the artist ids ??
group: 3,2,6 -> 2-3-6 and 6,3,2 -> 2-3-6
I don't have much experience in RDBMS. However, I have read papers of Codd and books by C.J. Date.
So, instead of using RDBMS jargon, I'll try to explain in more common sensical terms (at least to me!)
Here goes -
Singer names should be standard on "First Name - Last Name" basis
Each "Singer" should have an entry in the "Artists Group" table even if they have performed solo
Each entry in the "Artists Group" will consist of multiple "Singer" ordered alphabetically. There should be a single occurance of a specific combination.
Each song will have an entry of a unique record from "Artists Group" regardless of whether they are solo, duets or in a gang.
I don't know if this makes much sense, but it's my two cents!

Best Schema to represent NCAA Basketball Bracket

What is the best database schema to represent an NCAA mens basketball bracket? Here is a link if you aren't familiar: http://www.cbssports.com/collegebasketball/mayhem/brackets/viewable_men
I can see several different ways you could model this data, with a single table, many tables, hard-coded columns, somewhat dynamic ways, etc. You need a way to model both what seed and place each team is in, along with each game and the outcome (and possibly score) of each. You also need a way to represent who plays who at what stage in the tournament.
In the spirit of March Madness, I thought this would be a good question. There are some obvious answers here, and the main goal of this question is to see all of the different ways you could answer it. Which way is best could be subjective to the language you are using or how exactly you are working with it, but try to keep the answers db agnostic, language agnostic and fairly high level. If anyone has any suggestions on a better way to word this question or a better way to define it let me know in the comments.
The natural inclination is to look at a bracket in the order the games are played. You read the traditional diagram from the outside in. But let's think of it the other way around. Each game is played between two teams. One wins, the other loses.
Now, there's a bit more to it than just this. The winners of a particular pair of games face off against each other in another game. So there's also a relationship between the games themselves, irrespective of who's playing in those games. That is, the teams that face off in each game (except in the first round) are the winners of two earlier games.
So you might notice that each game has two "child games" that precede it and determine who faces off in that game. This sounds exactly like a binary tree: each root node has at most two child nodes. If you know who wins each game, you can easily determine the teams in the "parent" games.
So, to design a database to model this, you really only need two entities: Team and Game. Each Game has two foreign keys that relate to other Games. The names don't matter, but we would model them as separate keys to enforce the requirement that each game have no more than two preceding games. Let's call them leftGame and rightGame, to keep with the binary tree nomenclature. Similarly, we should have a key called parentGame that tracks the reverse relationship.
Also, as I noted earlier, you can easily determine the teams that face off in each game by looking at who won the two preceding games. So you really only need to track the winner of each game. So, give the Game entity a winner foreign key to the Team table.
Now, there's the small matter of seeding the bracket. That is, modeling the match-ups for the first round games. You could model this by having a Game for each team in the overall competition where that team is the winner and has no preceding games.
So, the overall schema would be:
Game:
winner: Team
leftGame: Game
rightGame: Game
parentGame: Game
other attributes as you see fit
Team:
name
other attributes as you see fit
Of course, you would add all the other information you'd want to the entities: location, scores, outcome (in case the game was won by forfeit or some other out of the ordinary condition).
For a RDBMS, I think the simplest approach that's still flexible enough to accommodate the majority of situations is to do the following:
Teams has [team-id (PK)], [name], [region-id (FK to Regions)], [initial-seed]. You will have one entry for each team. (The regions table is a trivial code table with only four entries, one for each NCAA region, and is not listed here.)
Participants has [game-id (FK to Games)], [team-id (FK to Teams)], [score (nullable)], [outcome]. [score] is nullable to reflect that a team might forfeit. You will have typically have two Participants per Game.
Games has [game-id (PK)], [date], [location]. To find out which teams played in a game, look up the appropriate game-id in the Participants table. (Remember, there might be more than two teams if someone dropped out or was disqualified.)
To set up the initial bracket, match the appropriate seeds to each other. As games are played, note which team has outcome = Winner for a particular game; this team is matched up against the winner of another game. Fill in the bracket until there are no more winning teams left.
Since you didn't specify RDBMS, I'm gonna be a little different and go with a CouchDB approach since I was reading about that this weekend. Here's the document structure I've come up with a represent a game.
{
"round" : 1, //The final would be round 5, and I guess Alabama St. vs. Morehead would be 0
"location" : "Dayton, OH",
"division": "South",
"teams" : ["UNC", "Radford"] //A feature of Couch is that fields like teams don't need a fixed nuber of columns.
"winner" : "UNC" //Showing my bias
}
A more interesting or complete application might have data for teams, rankings, and the like stored somewhere as well. John's approach covers that angle well, it seems. I welcome any comments from people who know better on my Couch skills.
I created a small system with the following tables:
Games: GameId, TournId, RoundId, Sequence, Date, VisitorId, VisitorScore, HomeId, HomeScore, WinnerId, WinnerGameId, WinnerHome (bit)
Predictions: PredId, UserId, GameId, PredVisitorId, PredHomeId, PredWinnerId
Rounds: RoundId, TournId, RoundNum, Heading1, Heading2
Teams: TeamId, TournId, TeamName, Seed, MoreInfo, Url
Tournaments: TournId, TournDesc
Users: TournId, UserName
WinnerGameId connects the winner of a game to their next game. WinnerHome tells whether the winner is the home or visitor of that next game. Other than that, I think it's pretty self explanatory.
Proposed Model
Proposed ER Diagram http://img257.imageshack.us/img257/1464/ncaaer.jpg
Team Table
All we need to know about a team is the name and seed. Therefore we need a "Team" table to store the seed value. The only candidate key is team name so we will use that as the primary to keep things simple. NCAA team names are unlikely to change over the course of a single tournament or contain duplicates so it should be an adequate key.
MatchUp Table
A "MatchUp" table can be used to pair the teams into each of the match ups. Foreign Keys (FK1, FK2) to the "Team" will ensure that the teams exist and a primary key over these values ensures that teams are only matched up against each other once.
A foreign key (FK4) to the "Team" table from the "MatchUp" table will record the winner. Logically the winner would need to be one of the two teams participating in the match up. A check constraint against the primary key could ensure this.
Once the outcome of a match up has been determined the Victor's seed could be retrieved from the team table in order to compare against other Victor's in order to determine subsequent match ups. Upon doing so an FK (FK3) to the resulting match up can be written to the determining match ups in order to depict the progress of the tournament (although this data could probably be derived at any time).
Games Table
I also modeled out the games of each Match Up. A game is identified by the match up it is a part of and a sequence number based on the order in which it took place during the match up. Games have a winner from the team table (FK2). Score could be recorded in this table as well.
4 tables:
Team(Team, Region, Seed)
User(UserId, Email, blablabla)
Bracket(BracketId, UserId, Points)
Pick(BracketId, GameId, Team, Points)
Each bracket a person submits will have 63 rows in the Pick table.
After each game is played you would update the pick table to score individual picks. Points field in this table will be null for game not yet played, 0 for an incorrect pick or positive number for correct pick. GameId is just a key identifying where in that users bracket this pick goes (ex: East_Round2_Game2, FinalFour_Game1).
The points column in the bracket table can be updated after each update of the pick table so it contains the sum of points for that bracket. The most looked at thing will be the standings, don't want to re-sum those every time someone wants to view the leader board.
You don't need to keep a table with all the games that actually get played or their results, just update the pick table after each game. You can even do the bracket highlighting of correct/incorrect picks by just looking at the Points column in the pick table.
In keeping track of a large number of different bracket predictions: You could use 67 bits for keeping track of the outcome of each game. (ie. Each of the sixty-seven games played in the tournament is each represented by a bit, 1 = "team A wins", 0 = "team B wins"). To display any given bracket, you can use a pretty simple function to map the 67 bits to the UI. The function knows the team names and their initial location, and then it tracks their movement through the bracket as it traces the 'bitboard'.
I use the same schema for all of my databases.
t
--------
1 guid PK
2 guid FK
3 bit
Then in my code:
select [2],[3] from t where [1] = #1
#1 is the id of the data I am fetching. Then if [2] is not null, I select again, setting #1 to [2].
This makes it really easy to model the situation you posted.