Confusion over relationship in a tennis tournament context - sql

I'm making a simple tennis database that features:
Tournaments (Mens/womens singles, Mens/womens doubles, Mixed doubles)
Players
Results
My Results table looked something like
ResultID
DatePlayed
Player1ID (became ParticipantID1)
Player1Score
Player2ID (became ParticipantID2)
Player2Score
I then realised that a player could play in more than one tournament so I needed another table, Participant (or could be called team, or couple).
ParticipantID
PlayerID
TournamentID
With all three ID's being composite.
Now Joe (ID: 1) and Tom (ID: 2) can be a doubles partner but Joe can still play in the Singles
Participant ID -- PlayerID -- TournamentID
1 ----------------- 1 ------------ 1
1 ----------------- 2 ------------ 1
2 ----------------- 1 ------------ 2
Now the problem I have with this is that auto-increment/identify would have to be turned for the Participant table meaning more coding work would have to be done (when creating and validating etc).
The second problem I have with this is that all the singles players data would be repeating, as they would be stored in the Player table and also the Participant table.
I could have another Results table, i.e. DoubleResults, but I feel this isn't necessary.
Is this the only way it can be done or has my mind just gone blank?

A tournament will have many matches. Therefore you need a one to many relationship between tournaments and matches.
A match will have a type (singles, doubles, etc). So your match table has to include a type id.
You will need a many to many relationship between teams and matches. A team can have either one or two players, depending on the type of match. At this point, it comes down to how much detail you want to store. It might be simply good enough to store the winning team id in this table.

Related

Structure for 3 or more tables with relationship

I need help about database structure,
so I need to join 3 tables but I'm confuse what structure is the right way.
Region_table
id region_name
1 Region1
2 Region2
Province
id province_name
1 Province1
City
id city_name
1 city1
I'm thinking to create new table that connects all.
like id, region_id, province_id, city_id
region_province_city
id region_id province_id city_id
1 1 1 1
2 1 1 2
or I can do like
Province
id province_name region_id
1 Province1 1
City
id city_name province_id
1 city1 1
I would generally suggest the hierarchical structure. That is, adding the parent id to each of the two tables.
Why? This guarantees that the region on a given province is always the same. That can be hard to guarantee in the 3-way junction table. After all, you don't have an independent relationship among the three entities. You have a hierarchical relationship and you would like to enforce that data integrity.
An example of an independent relationship would be:
Customers
Items
Payment Method
Presumably a customer could purchase any item using any payment method -- and even use different payment methods on the same item.
Would I always recommend this structure? Well, no. If the data is not being modified, then you are pretty safe with the three-way table. It works very well for looking up information about a "city". It can be verified when created. And it might provide some simplification (particularly if the hierarchies are deeper).
That said, it works best when your joins are to the lowest level. It is a little tricky to get the region for a province. A typical solution would be to augment the data with NULL values for city to get that information.

Purpose of Self-Joins

I am learning to program with SQL and have just been introduced to self-joins. I understand how these work, but I don't understand what their purpose is besides a very specific usage, joining an employee table to itself to neatly display employees and their respective managers.
This usage can be demonstrated with the following table:
EmployeeID | Name | ManagerID
------ | ------------- | ------
1 | Sam | 10
2 | Harry | 4
4 | Manager | NULL
10 | AnotherManager| NULL
And the following query:
select worker.employeeID, worker.name, worker.managerID, manager.name
from employee worker join employee manager
on (worker.managerID = manager.employeeID);
Which would return:
Sam AnotherManager
Harry Manager
Besides this, are there any other circumstances where a self-join would be useful? I can't figure out a scenario where a self-join would need to be performed.
Your example is a good one. Self-joins are useful whenever a table contains a foreign key into itself. An employee has a manager, and the manager is... another employee. So a self-join makes sense there.
Many hierarchies and relationship trees are a good fit for this. For example, you might have a parent organization divided into regions, groups, teams, and offices. Each of those could be stored as an "organization", with a parent id as a column.
Or maybe your business has a referral program, and you want to record which customer referred someone. They are both 'customers', in the same table, but one has a FK link to another one.
Hierarchies that are not a good fit for this would be ones where an entity might have more than one "parent" link. For example, suppose you had facebook-style data recording every user and friendship links to other users. That could be made to fit in this model, but then you'd need a new "user" row for every friend that a user had, and every row would be a duplicate except for the "relationshipUserID" column or whatever you called it.
In many-to-many relationships, you would probably have a separate "relationship" table, with a "from" and "to" column, and perhaps a column indicating the relationship type.
I found self joins most useful in situations like this:
Get all employees that work for the same manager as Sam. (This does not have to be hierarchical, this can also be: Get all employees that work at the same location as Sam)
select e2.employeeID, e2.name
from employee e1 join employee e2
on (e1.managerID = e2.managerID)
where e1.name = 'Sam'
Also useful to find duplicates in a table, but this can be very inefficient.
There are several great examples of using self-joins here. The one I often use relates to "timetables". I work with timetables in education, but they are relevant in other cases too.
I use self-joins to work out whether two items clash with one another, e.g. a student is scheduled for two lessons which happen at the same time, or a room is double booked. For example:
CREATE TABLE StudentEvents(
StudentId int,
EventId int,
EventDate date,
StartTime time,
EndTime time
)
SELECT
se1.StudentId,
se1.EventDate,
se1.EventId Event1Id,
se1.StartTime as Event1Start,
se1.EndTime as Event1End,
se2.StartTime as Event2Start,
se2.EndTime as Event2End,
FROM
StudentEvents se1
JOIN StudentEvents se2 ON
se1.StudentId = se2.StudentId
AND se1.EventDate = se2.EventDate
AND se1.EventId > se2.EventId
--The above line prevents (a) an event being seen as clashing with itself
--and (b) the same pair of events being returned twice, once as (A,B) and once as (B,A)
WHERE
se1.StartTime < se2.EndTime AND
se1.EndTime > se2.StartTime
Similar logic can be used to find other things in "timetable data", such as a pair of trains it is possible to take from A via B to C.
Self joins are useful whenever you want to compare records of the same table against each other. Examples are: Find duplicate addresses, find customers where the delivery address is not the same as the invoice address, compare a total in a daily report (saved as record) with the total of the previous day etc.

SQL query to get random unused combination

Background:
I want to create a database that can run a tournament of 1 vs 1 matchups. It needs to keep track of who won and lost each matchup and any comments about that matchup as well as decide the next unique matchup randomly.
Rules:
There are x number of players. Each player will eventually play every other player once, in effect covering all possible unique combinations of players.
Database Tables (with Sample data):
DECLARE #Players TABLE (
ID INT PRIMARY KEY IDENTITY,
Name VARCHAR(50)
)
ID Name
-- -----
1 Alex
2 Bob
3 Chris
4 Dave
DECLARE #Matches TABLE (
ID INT PRIMARY KEY IDENTITY,
WinnerId INT,
LoserId INT
)
ID WinnerId LoserId
-- -------- -------
1 1 2
2 4 2
3 3 1
DECLARE #Comments TABLE (
ID INT PRIMARY KEY IDENTITY,
MatchId INT,
Comment VARCHAR(MAX)
)
ID MatchId Comment
-- ------- ------------------------------
1 2 That was a close one.
2 3 I did not expect that outcome.
Problem:
How can I efficiently query to get a single random match up that has not yet occurred?
The major problem is that the number of player can and will grow over time. Right now in my example data I only have 4 players which leaves 6 possible matches.
Alex,Bob
Alex,Chris
Alex,Dave
Bob,Chris
Bob,Dave
Chris,Dave
That would be small enough to simply keep grabbing 2 random numbers that correspond to the Player's id and then check the matchups table if that matchup has already occurred. If it has: get 2 more and repeat the process. If it hasn't then use it as the next matchup. However if I have 10,000 players that would be 49995000 possible matchups and it would simply become too slow.
Can anyone point me in the right direction for a more efficient query? I am open to changes in the database design if that would help make things more efficient as well.
If you make an outer join between every possible pairing and those that have been played, then filter out the ones that have been played, you're left with pairings that have not yet been played. Selecting a random one is then a trivial case of ordering:
SELECT p1.Name, p2.Name FROM
Players p1
JOIN Players p2 ON (
p1.ID < p2.ID
)
LEFT JOIN Matches ON (
(WinnerId = p1.ID AND LoserId = p2.ID)
OR (WinnerId = p2.ID AND LoserId = p1.ID)
)
WHERE Matches.ID IS NULL
ORDER BY RAND()
LIMIT 1;
EDIT
As noted by ypercube below, the above LIMIT syntax is MySQL specific. You may need to use instead the appropriate syntax for your SQL implementation - let us know what it is and someone can advise, if required. I know that in Microsoft SQL Server one uses TOP and in Oracle ROWNUM, but otherwise your Googling is probably as good as mine. :)
I am wondering why you need to pick 2 players in random. How about generate the whole list of possible matches up front, but then add a WinnerId column? For the next match, just pick the first row which has no WinnerId set.
Although the data set is large, using the limit key will stop additional processing as soon as a single key is returned. One possibility might be to user a query like below to return the next match.
SELECT * FROM Players p1, Players p2 WHERE p1.ID <> p2.ID AND (p1.ID, p2.ID) NOT IN (Select WinnerID, LoserID FROM Matches) AND (p2.ID, p1.ID) NOT IN (Select WinnerID, LoserID FROM Matches) LIMIT 1
For your problem, you want A) to consider all 2-element subsets of players B) in a randomized order.
For A, other answers are suggesting using SQL joins with various conditions. A less database-intensive solution if you really need to handle 10,000 players might be to use an efficient combination generating algorithm. I found a previous answer listing some from TAOCP vol. 4 here. For the 2 element subset case, a simple double nested loop over the player ids in lexicographical sequence would be fine:
for player_a in 1..num_players:
for player_b in player_a+1..num_players:
handle a vs. b
For part B, you could use a second table mapping players 1..n to a shuffling of the integers 1..n. Keep this shuffled mapping around until you're done the tournament process. You can use the Knuth-Fisher-Yates shuffle.
To keep track of where you are in a instance of this problem, you'll probably want to be saving the combination generator's state to the database regularly. This would probably be faster than figuring out where you are in the sequence from the original tables alone.
As you mention, handling 10,000 players in matchups this way results in nearly fifty million matchups to handle. You might consider a tournament structure that doesn't require every player to compete against each other player. For example, if A beats B and B beats C, then you might not have to consider whether A beats C. If applicable in your scenario, that sort of shortcut could save a lot of time.

Schema Normalization :: Composite Game Schedule Constrained by Team

Related to the original generalized version of the problem:http://stackoverflow.com/questions/6068635/database-design-normalization-in-2-participant-event-join-table-or-2-column
As you'll see in the above thread, a game (event) is defined as exactly 2 teams (participants) playing each other on a given date (no teams play each other more than once in a day).
In our case we decided to go with a single composite schedule table with gameID PK, 2 columns for the teams (call them team1 & team2) and game date, time & location columns. Additionally, since two teams + date must be unique, we define a unique key on these combined fields. Separately we have a teams table with teamID PK related to schedule table columns team1 & team2 via FK.
This model works fine for us, but what I did not post in above thread is the relationship between scheduled games and results, as well as handling each team's "version" of the scheduled game (i.e. any notes team1 or team2 want to include, like, "this is a scrimmage against a non-divisional opponent and will not count in the league standings").
Our current table model is:
Teams > Composite Schedule > Results > Stats (tables for scoring & defense)
Teams > Players
Teams > Team Schedule*
*hack to handle notes issue and allow for TBD/TBA games where opponent, date, and/or location may not be known at time of schedule submission.
I can't help but think we can consolidate this model. For example, is there really a need for a separate results table? Couldn't the composite schedule be BOTH the schedule and the game result? This is where a join table could come into play.
Join table would effectively be a gameID generator consisting of:
gameID (PK)
gameDate
gameTime
location
Then revised composite schedule/results would be:
id (PK)
teamID (FK to teams table)
gameID (FK to join table)
gameType (scrimmage, tournament, playoff)
score (i.e. number of goals)
penalties
powerplays
outcome (win-loss-tie)
notes (team's version of the game)
Thoughts appreciated, has been tricky trying to drilldown to the central issue (thus original question above)
I don't see any reason to have separate tables for the schedule and results. However, I would move "gameType" to the Games table, otherwise you're storing the same value twice. I'd also consider adding the teamIDs to the Games table. This will serve two purposes: it will allow you to easily distinguish between home and away teams and it will make writing a query that returns both teams' data on the same row significantly easier.
Games
gameID (PK)
gameDate
gameTime
homeTeamID
awayTeamID
location
gameType (scrimmage, tournament, playoff)
Sides
id (PK)
TeamID (FK to teams table)
gameID (FK to games table)
score
penalties
powerplays
notes
As shown, I would also leave out the "Outcome" field. That can be effectively and efficiently derived from the "Score" columns.

Dynamic Tables?

I have a database that has different grades per course (i.e. three homeworks for Course 1, two homeworks for Course 2, ... ,Course N with M homeworks). How should I handle this as far as database design goes?
CourseID HW1 HW2 HW3
1 100 99 100
2 100 75 NULL
EDIT
I guess I need to rephrase my question. As of right now, I have two tables, Course and Homework. Homework points to Course through a foreign key. My question is how do I know how many homeworks will be available for each class?
No, this is not a good design. It's an antipattern that I called Metadata Tribbles. You have to keep adding new columns for each homework, and they propagate out of control.
It's an example of repeating groups, which violates the First Normal Form of relational database design.
Instead, you should create one table for Courses, and another table for Homeworks. Each row in Homeworks references a parent row in Courses.
My question is how do I know how many homeworks will be available for each class?
You'd add rows for each homework, then you can count them as follows:
SELECT CourseId, COUNT(*) AS Num_HW_Per_Course
FROM Homeworks
GROUP BY CourseId
Of course this only counts the homeworks after you have populated the table with rows. So you (or the course designers) need to do that.
Decompose the table into three different tables. One holds the courses, the second holds the homeworks, and the third connects them and stores the result.
Course:
CourseID CourseName
1 Foo
Homework:
HomeworkID HomeworkName HomeworkDescription
HW1 Bar ...
Result:
CourseID HomeworkID Result
1 HW1 100