Database Modeling of a Softball League - sql

I am modeling a database for use in a softball league website. I'm not that experienced in DB Modeling, and I'm having a hard time with a question about the future.
Right now I have the following tables:
players table (player_id, name, gender, email, team_id)
teams table (team_id, name, captain[player_id], logo, wins_Regular_season, losses_regular_season)
regular_season table (game_id, week, date, home[team_id], away[team_id], home_score, away_score, rain_date)
playoff table (pgame_id, date, home[team_id], away[team_id], home_score, away_score, winnerTo[pgame_id], loserTo[pgame_id])
To make the data persist from season to season, but to also have an easy way to access the data should I:
A) include a year column in the tables and then filter my queries by year?
B) create new tables every year?
C) Do something else that makes more sense but that I can't think of.

This design is not only bad about the future. It is also wrong regarding the past. You're not keeping history in a proper way.
Let's suppose a player changes team: how would that fit into this design? It should be easy to get that kind of information...
And the best way of doing that (IMHO) would be also representing the season as an entity, as a concrete table. Then you should replicate this information in each relationship. Meaning, for instance, that a player does not simply belong to a team: he belongs to a team in a specific season, and may belong to another team when the season changes.
OTOH, I don't think it's wise to keep regular_season and playoff as distinct tables: they could be easily merged into one, by adding some sort of flag in order to keep that information.
Edit This is what I'm meaning:
Notice that there is a Season table.
A Player belongs to a Team in a Season.
NO NEED TO DUPLICATE ANYTHING. A team has only ONE record in the DB; a player will be associated to only ONE record.
I did NOT design the Playoff table, because I believe it should not exist. If the OP disagrees, just add it.
That way you can keep track of all seasons, without needing to replicate the whole DB. I think this is also better than using a year column, which is not meaningful, and cannot be easily constrained (like a foreign key can).
But, please, feel free to disagree.

The standard way would be to add year columns to your tables. That way you can easily call up the past with a select query, or view. SQL Server has good support for this. I've dealt with cleanup of the other route, and it isn't pretty after a few years of data have accumulated.

I would go with option A and have a year column in the seasons table and playoff table.

You already have a date column, you can use that to find the year
SELECT * FROM regular_season WHERE YEAR(date) = 2011

Related

It is ok to have duplicated values in SQL

I'm not a DBA so I'm not familiar with the proper lingo, so maybe the title of the question could be a little misleading.
So, the thing. I have Members for a certain system, these members can be part of a demographic segment (any kind of segment: favorite color, gender, job, etc)
These are the tables
SegmentCategory
ID, Name, Description
SegmentCategory_segment
SegmentID, SegmentCategoryID
Segment
ID, Name, Description
MemberSegment
ID, MemberID, SegmentID
So the guy that designed the DB decided to go uber normalizing everything so he put the member's gender on a segment and not in the Member's table.
Is this ok? According to my logic, gender it's a property of the Member so it must be on its entity. But by doing this then there must be duplicated data (The gender on the Member and Gender as a segment) But a trigger on the Member table could just fix this (Update the segment on a gender change)
Having to crawl 4 tables just to get a property from the member seems like over engineering to me.
My question is whether I'm right or not? If so, how could I propose the change to the DBA?
There isn't a blanket rule you can apply to database decisions like this. It depends on what applications/processes it is supporting. A database for reporting is much easier to work with when it is more de-normalized (in a well thought out way) than it is a more transactional database.
You can have a customer record spread across 2 tables, for instance, if some data is accessed or updated more often than other parts. Say you only need one half of the data 90% of your queries, but don't want to drag around the the varchar(max) fields you have there for whatever reason.
Having said that, having a table with just a gender/memberid is on the far side of extreme. From my naive understanding of your situation I feel you just need a members table with views over top for your segments.
As for the DBA, ultimately I imagine it will be them who will be needing to maintain the integrity of the data, so I would just approach them and say "hey, what do you think of this?" Hopefully they'll either see the merit or be able to give you reasons to their design decisions.

Trying to map Student with Marks obtained

I have the following tables:
Student(Student_ID(PK), FName, LName,...)
Subject_details(Subject_code(PK), Subject_Name)
Subjects_Enrolled(Student_ID,Subject_Code)
My question is:
Can i have table as the one below
Marks_Details(Student_ID,Subject_Code,Marks_Obtained)
Does that table break any rule of database or anything as such? My requirement is to have a table that maps the marks obtained by a student in a particular subject and this is what I've come up with. Is it correct or is there any other approach to do the same?
Please let me the know the reason if you're going to downvote.
Thanks.
That data model looks fine. Your simply setting up a Many To Many relationship with some additional information (marks) contained in each record.
One suggestion would be to rename the Subject_Details table to Subject. I think this verbiage makes the relationship more clear.
Another suggestion would be to rename Subjects_Enrolled to Enrollment and just add the Marks_Obtained column to this table. This would eliminate the need for Marks_Details since the two tables basically contain the same information. Why store and maintain this data twice? The idea would be to insert a record into Enrollment when a student enrolls within a course and then to update the Marks_Obtained column at a later date when the course is completed.
Your idea doesn't "break any rule of database". I'd actually say it's pretty much the standard way of storing this data.
I would recommend to give the Marks_Details table a separate primary key, and maybe a date field. If a student wants to retake the subject, do you want the new data to override the old, or do you want to keep it both? It's up to you really.

Table structure for Scheduling App in SQL DB

I'm working on a database to hold information for an on-call schedule. Currently I have a structure that looks about like this:
Table - Person: (key)ID, LName, FName, Phone, Email
Table - PersonTeam: (from Person)ID, (from Team)ID
Table - Team: (key)ID, TeamName
Table - Calendar: (key dateTime)dt, year, month, day, etc...
Table - Schedule: (from Calendar)dt, (id of Person)OnCall_NY, (id of Person)OnCall_MA, (id of Person)OnCall_CA
My question is: With the Schedule table, should I leave it structured as is, where the dt is a unique key, or should I rearrange it so that dt is non-unique and the table looks like this:
Table - Schedule: (from Calendar)dt, (from Team)ID, (from Person)ID
and have multiple entries for each day, OR would it make sense to just use:
Table - Schedule: (from Calendar)dt, (from PersonTeam)KeyID - [make a key ID on each of the person/team pairings]
A team will always have someone on call, but a person can be on call for more than one team at a time (if they are on multiple teams).
If a completely different setup would work better let me know too!
Thanks for any help! I apologize if my question is unclear. I'm learning fast but nevertheless still fairly new to using SQL daily, so I want to make sure I'm using best practices when I learn so I don't develop bad habits.
The current version, one column per team, is probably not a good idea. Since you're representing teams as a table (and not as an enum or equivalent), it means you expect to add/remove teams over time. That would force you to add/remove columns to the table, which is always a much larger task than adding/removing a few rows.
The 2nd option is the usual solution to a problem like this. A safe choice. You can always define an additional foreign key constraint from Schedule(teamID, personID) to PersonTeam to ensure you don't mistakenly assign schedule duty to a person not belonging to the team.
The 3rd option is pretty much equivalent to the 2nd, only you're swapping a composite natural key for PersonTeam for a surrogate simple key. Since the two components of said composite key are already surrogate, there is no advantage (in terms of immutability, etc.) to adding this additional one. Plus it would turn a very simple N-M relationship (PersonTeam) which most DB managers / ORMs will handle nicely into a more complex object which will need management on its own.
By Occam's razor, I'd do away with the additional surrogate key and use your 2nd option.
In my view, the answer may depend on whether the number of teams is fixed and fairly small. Of course, whether the names of the teams are fixed or not, may also matter, but that would probably have more to do with column naming.
More specifically, my view is this:
If the business requirement is to always have a small and fixed number of people (say, three) on call, then it may well be more convenient to allocate three columns in Schedule, one for every team to hold the ID of the appointed person, i.e. like your current structure:
dt OnCall_NY OnCall_MA OnCall_CA
--- --------- --------- ---------
with dt as the primary key.
If the number of teams (in the Team table) is fixed too, you could include teams' names/designators in the column names like you are doing now, but if the number of teams is more than three and it's just the number of teams in Schedule that is limited to three, then you could just use names like OnCallID1, OnCallID2, OnCallID3.
But even if that requirement is fixed, it may only turn out fixed today, and tomorrow your boss says, "We no longer work with a fixed number of teams (on call)", or "We need to extend the number of teams supported to four, and we may need to extend it further in the future". So, a more universal approach would be the one you are considering switching to in your question, that is
dt Team Person
--- ---- ------
where the primary key would now be dt, Team.
That way you could easily extend/reduce the number of people on call on the database level without having to change anything in the schema.
UPDATE
I forgot to address your third option in my original answer (sorry). Here goes.
Your first option (the one actually implemented at the moment) seems to imply that every team can be presented by (no more than) one person only. If you assign surrogate IDs to the Person/Team pairs and use those keys in Schedule instead of separate IDs for Person and Team, you will probably be unable to enforce the mentioned "one person per team in Schedule" requirement (or, at least, that might prove somewhat troublesome) at the database level, while, using separate keys, it would be just enough to set Team to be part of a composite key (dt, Team) and you are done, no more than one team per day now.
Also, you may have difficulties letting a person change the team over time if their presence in the team was fixated in this way, i.e. with a Schedule reference to the Person/Team pair. You would probably have to change the Team reference in the PersonTeam table, which would result in misrepresentation of historical info: when looking at the people on call back on certain day, the person's Team shown would be the one they belong to now, not the one they did then.
Using separate IDs for people and teams in Schedule, on the other hand, would allow you to let people change teams freely, provided you do not make (Schedule.Team, Schedule.Person) a reference to (PersonTeam.Team, PersonTeam.Person), of course.

How to model table hierarchy

I'm trying to make an application about formula 1. I have three tables: Team, Driver, And Race Results. I'm thinking about three options (and maybe I'm missing more):
Have a derived table Driver_Team. Have a Driver_TeamId in that table. Use that Driver_TeamId in the Race Results table. This seems to solve most of the queries I think I am going to use, but feels awkward and I haven't seen it anywhere.
Have Driver.DriverId and Team.TeamId in the Race Results table. This has the problem of not being able to add extra information. I don't know yet what information, maybe the date of the start of joining a new team. Then I would need a junction table (because that information is not Race Result related).
The last one: Have a junction table Driver_Team, but have only the Driver.DriverId as Foreign Key in the Race Results table. Problem is, queries like "How much points did team x get in season y/several seasons" really really horrible.
Am I missing another solution? If yes, please tell me! :-) Otherwise, which of these solutions seems the best?
Thanks!
Your first option gets my vote. I'd also suggest adding a Race table (to hold data such as track, date, conditions, etc.), and make Race_Results the combination of Driver_Team and Race.
I suggest the following:
RaceResult - Driver - DriverTeam - Team
Where RaceResult contains race_date, DriverTeam contains ( driver_id, team_id, team_join_date and team_leave_date ). Then you would be able to get all the info you're asking about in your question, even though the queries may be complicated.
Just brainstorming, one object model may look like this. Note the conspicuous lack of an "id" field on RaceResult, as the finishing position acts perfectly as a natural key (one driver per finishing position). Of course, there may be lots of other options as well.
Team:
id
name
Driver:
id
name
team_id
Race:
id
venue
date
RaceResults:
position
driver_id
race_id
For the kind of queries you're talking about, I think DriverId and TeamId should both be in RaceResults. If you want to store additional information about an association between a driver and a team, then that should be placed in a separate table. This appears to create a little bit of redundancy, since the driver/team pair in the race table will be limited by the employment dates in the DriverTeam table, but given the complexities of contracts and schedules, I think it may end up being not especially redundant.
I like the way you are planning the DB to support your queries. I have run into way too much OOP thinking in DB design over the years!
If you only store DriverId and TeamId in the RaceResults table, then you cannot associate a driver to a team without a RaceResult.

Adding new fields vs creating separate table

I am working on a project where there are several types of users (students and teachers). Currently to store the user's information, two tables are used. The users table stores the information that all users have in common. The teachers table stores information that only teachers have with a foreign key relating it to the users table.
users table
id
name
email
34 other fields
teachers table
id
user_id
subject
17 other fields
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user use users.id. Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
e.g.
users
id
name
email
subject
51 other fields
Is this too many fields for one table? Will this impede performance?
I think this design is fine, assuming that most of the time you only need the user data, and that you know when you need to show the teacher-specific fields.
In addition, you get only teachers just by doing a JOIN, which might come in handy.
Tomorrow you might have another kind of user who is not a teacher, and you'll be glad of the separation.
Edited to add: yes, this is an inheritance pattern, but since he didn't say what language he was using I didn't want to muddy the waters...
In the rest of the database, there are no references to teachers.id. All other tables who need to relate to a user
use users.id.
I would expect relating to the teacher_id for classes/sections...
Since a user will only have one corresponding entry in the teachers table, should I just move the fields from the teachers table into the users table and leave them blank for users who aren't teachers?
Are you modelling a system for a high school, or post-secondary? Reason I ask is because in post-secondary, a user can be both a teacher and a student... in numerous subjects.
I would think it fine provided neither you or anyone else succumbs to the temptation to reuse 'empty' columns for other purposes.
By this I mean, there will in your new table be columns that are only populated for teachers. Someone may decide that there is another value they need to store for non-teachers, and use one of the teacher's columns to hold it, because after all it'll never be needed for this non-teacher, and that way we don't need to change the table, and pretty soon your code fills up with things testing row types to find what each column holds.
I've seen this done on several systems (for instance, when loaning a library book, if the loan is a long loan the due date holds the date the book is expected back. but if it's a short loan the due date holds the time it's expected back, and woe betide anyone who doesn't somehow know that).
It's not too many fields for one table (although without any details it does seem kind of suspicious). And worrying about performance at this stage is premature.
You're probably dealing with very few rows and a very small amount of data. You concerns should be 1) getting the job done 2) designing it correctly 3) performance, in that order.
It's really not that big of a deal (at this stage/scale).
I would not stuff all fields in one table. Student to teacher ratio is high, so for 100 teachers there may be 10000 students with NULLs in those 17 fields.
Usually, a model would look close to this:
I your case, there are no specific fields for students, so you can omit the Student table, so the model would look like this
Note that for inheritance modeling, the Teacher table has UserID, same as the User table; contrast that to your example which has an Id for the Teacher table and then a separate user_id.
it won't really hurt the performance, but the other programmers might hurt you if you won't redisign it :) (55 fielded tables ??)