SQL Database Design Many to Many - sql

I am creating a database based on a sporting game to store matches and each player involved in each match. I am having trouble resolving a many to many relationship. I currently have the following tables:
Player
id
name
Match
id
date
PlayerMatch
player_id
match_id
home_team
goals_for
goals_against
There will always be a minimum of two players in a match. Is this the best design for this approach?

I would recommend a sticking with a many to many relationship. This allows you to change the specifications of how many players you can have in a game easily while not complicating the data model much.
Player
id
name
Match
id
date
PlayerMatch
player_id
match_id
is_home
goals_for
goals_against
Foreign key from PlayerMatch to Player
Foreign key from PlayerMatch to Match
--All the matches a player has played in.
SELECT m.*
FROM Player p
JOIN PlayerMatch pm
ON p.id = pm.player_id
JOIN Match m
ON m.id = pm.match_id
WHERE p.id = /*your player Id*/
--All the players in a match
SELECT p.*
FROM Match m
JOIN PlayerMatch pm
ON m.id = pm.match_id
JOIN Player p
ON p.id = pm.player_id
WHERE m.id = /*your match Id*/
--player information for a single match.
SELECT pm.*
FROM Player p
JOIN PlayerMatch pm
ON p.id = pm.player_id
JOIN Match m
ON m.id = pm.match_id
WHERE p.id = /*your player Id*/
AND m.id = /*your match Id*/

That is a valid option, though I would suggest a naming convention where you use the same column name in both tables (i.e. use match_id in both Match and PlayerMatch; same for player_id). This helps make your SQL a bit more clear and when doing joins in some databases (MySQL) you can then use the 'using (col1, col2, ...)' syntax for the joins.

I wouldn't use the many-to-many relationship, I would do like this:
Player
id
name
Match
id
home_player_id
guest_player_id
date
goals_home_player
goals_guest_player

I think I'd try to model the match first & then see what happens with the table design :
Match
-------------
match_Id
player1_Id
player2_Id
player1_Goals
player2_Goals
Where player1_Id and Player2_Id are both foreign keys onto the Player table
Player
---------
Id
Name
By convention player1 would always be the home team
then you would query it like
Select p1.name as player1_Home, p2.name as player2_away,
matchId,
player1_Goals as homeGoals, player2_Goals as awayGoals
from Match m
inner join Player p1 on p1.id = m.Player1_Id
inner join Player p2 on p2.id = m.Player2_Id

This sort of data relationship is not at all unnatural. To set it up, just ask yourself two questions:
Do players have more than one match?
Do matches have more than one player?
If the answer is yes to both, then you have a many-to-many relationship and these are not at all uncommon. Their implementation is only slightly more complicated. In a one-to-many relationship, you'd hold a foreign key to a list of records in some table. As it happens, this is still how it works in many-to-many relationships, except that both the Players and the Matches table will need a foreign key to some list of records.
This list is called the Bridge Table. So you'll need to use a total of three tables to descrive the relationship
Players
-------
player_id
<player attribute columns, eg last_name, first_name, goals_scored, etc.>
Player_Match
------------
player_id
match_id
Matches
-------
match_id
<a list of columns that are match attributes, eg. match date, etc.>
The table in the middle of the diagram above is called a bridge table, and it does nothing more than map players to matches, and it also maps matches to a list of players. Often, bridge tables have only 2 columns, each representing a foreign key to one of the bridged tables. There is no need for a primary key in a bridge table, and if there is not one, it means that a player can have more than one of the same match. If a player can have only one of one kind of match, then make the primary key for each row of the bridge table a composite key on both of the columns.
In database design, normalization is a highly desirable relational goal because it provides a database with the greatest possible flexibility and the lowest amount of redundancy. To normalize, ask yourself if the data you want to put in a table is -really- an actual attribute of the object described by the primary key. For example, is the home_team an actual attribute of the match. I would say no, it is not. In this case, you should replace home_team in your PlayerMatch table with a foreign key to a Teams table. In your Matches table, you ought to have two columns. One for a home team foreign key, and one for the away team key. The teams are not actual attributes of a match and so to normalize the Match table, you'd want to put those data in tables of their own.

Agree with M Hagopian, the op schema looks like a good start.

Related

How to shownicknames instead of foreign key in SQL

So I'm practicing queries on SQL for educational purposes and I'm running queries from an NBA Basketball dataset. I want to make a query that will show me a date, home team, away team, and winner column. I can easily do this with this query:
SELECT game_id, SUBSTRING(date,0,11) AS Date_Only, team_id_home, team_id_away, team_id_winner
FROM game
However, these only show the id's of the teams. I don't want to see the foreign key id's of the teams, I want to see the nicknames for the teams which there's a separate table where the team id's and their nicknames are. How am I able to show the nicknames instead of the foreign keys? I've thought of a few things but there may be an easier way to do this. Appreciate the help.
The query needs to join the table that contains the nicknames so that you can select the appropriate column, e. g.
SELECT game_id, SUBSTRING(date,0,11) AS Date_Only,
home_team.nickname AS home_team_nickname,
away_team.nickname AS away_team_nickname,
winner_team.nickname AS winner_team_nickname
FROM game
INNER JOIN team home_team ON game.team_id_home = home_team.team_id
INNER JOIN team away_team ON game.team_id_away = away_team.team_id
INNER JOIN team winner_team ON game.team_id_winner = winner_team.team_id
;

Writing SQL query to find ranking

I'm trying to determine for a given person how many people have a better score than they do, and group it by the different teams they belong to. So, in the tables below, I'm grabbing the list of team_id from the team_person table where the person_id matches the person I care about. That will get me all of the teams I belong to.
Then I need to know each person_id that is in any team I belong to so that I can find out what their maximum score is from the performances table.
Once I have that, I finally want to determine, for each team_id, how many people on that team have a better score than I do, where better is simply defined as having a larger value.
I've gotten way beyond my abilities with SQL at this point. What I have so far, which seems to get me the maximum score for all the people I care about, (basically everything but my final "by team" requirement) is this:
SELECT person_id, MAX(score) m
FROM performances
WHERE category_id = 7 AND person_id IN (
-- Find all the people on the teams I belong to
SELECT DISTINCT person_id
FROM team_person
WHERE team_id IN (
-- Find all the teams that I belong to
SELECT DISTINCT team_id
FROM team_person
WHERE person_id = 2
)
)
GROUP BY person_id
ORDER BY 2 DESC
My two relevant tables are defined like so, and I'm using psql 9.1.15
Table "public.team_person"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
ident | integer | not null default nextval('team_person_ident_seq'::regclass)
team_id | integer | not null
person_id | integer | not null
*chop extraneous columns*
Indexes:
"team_person_pkey" PRIMARY KEY, btree (ident)
"teamPersonUnique" UNIQUE CONSTRAINT, btree (team_id, person_id)
Foreign-key constraints:
"team_person_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(ident) ON DELETE CASCADE
"team_person_team_id_fkey" FOREIGN KEY (team_id) REFERENCES team(ident) ON DELETE CASCADE
Referenced by:
TABLE "roster" CONSTRAINT "roster_team_person_id_fkey" FOREIGN KEY (team_person_id) REFERENCES team_person(ident) ON DELETE SET NULL
Triggers:
update_team_person_modified BEFORE INSERT OR UPDATE ON team_person FOR EACH ROW EXECUTE PROCEDURE update_modified_column()
Table "public.performances"
Column | Type | Modifiers
-------------+--------------------------+--------------------------------------------------------------
ident | bigint | not null default nextval('performances_ident_seq'::regclass)
category_id | integer | not null
person_id | integer | not null
score | real | not null
*chop extraneous columns*
Indexes:
"performances_pkey" PRIMARY KEY, btree (ident)
Foreign-key constraints:
"performances_category_id_fkey" FOREIGN KEY (category_id) REFERENCES performance_categories(ident) ON DELETE CASCADE
"performances_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(ident) ON DELETE CASCADE
First, state just the problem, without assumptions about how to get to the solution. You've done that fairly well:
determine for a given person how many people have a better score than they do, and group it by the different teams they belong to.
but I'd rephrase a bit:
For each team a given person is a member of, how many people in that team have a better score than the subject person?
I don't know about you, but it suddenly seems simpler now. Take the team table, left outer join team_person and filter for teams we're a member of, left outer join performances to find games we played with that team, left outer join team_person again to get other people who're members of each team, left outer join performances, filter out teams the subject person isn't a member of, group and aggregate.
It's underspecified for some corner cases (like a team where you're the only member, or a team where you didn't play a game), but eh, whatever.
Problems:
There's no team table. Since you don't care about anything in the team table, you can omit it from the join and just use team_person as the join root.
Your team_person table is defective, by the way. It should have a UNIQUE constraint on (team_id, person_id). Or, better, that should be the primary key. It doesn't actually matter for this query because duplicate team memberships won't change the result, but it's bad data modelling. You can't be a member of a team more than once.
performances should also have a column identifying the particular game or whatever. Since you haven't shown one, I'm going to assume you mean that you're looking for people who, in any game, performed better than the subject person at least once, in that game or another game. If you actually want to find people who did better in a particular game then you need a suitable key on performances.
Fatal problem: performances is also missing a column linking the performance to the team. This makes it impossible to properly solve the problem because you can't get performances by a given person on a given team. I'm going to assume there is in fact a team_id on performances and you just left it out.
So, allowing for the above issues, I'd first acquire the data with a big join, then group and aggregate it. This join will give us, for each team we played in, for each of our performances, for each other player, for each of their other performances, one row with all the relevant information. You can then compare performances and aggregate.
The below is totally untested, since you didn't provide sample data and you chopped important parts out of your schema (or the schema is defective), but I'd try something like:
SELECT
my_performances.team_id,
-- Find how many distinct people scored better than us at least once,
-- no matter how many times or in which game.
COUNT(distinct other_team_person.person_id)
-- Start the join with our team memberships and how we scored in each.
-- If we didn't play any games for this team don't produce a result row
-- for it, so INNER JOIN.
FROM team_person my_team_person
INNER JOIN performances my_performances ON
(my_performances.person_id = my_team_person.person_id
AND my_performances.team_id = my_team_person.team_id)
-- Other members of teams we're also a member of, skipping
-- ourselves. An `INNER JOIN` is fine here because we know
-- a team with only ourselves as a member isn't interesting
-- and we might as well skip it.
INNER JOIN team_person others_team_person ON (
my_team_person.team_id = other_team_person.team_id
AND my_team_person.person_id <> other_team_person.person_id)
-- How each of those people performed in each team they're in
-- (because of previous filter, only considers teams we're in too).
-- INNER JOIN because if they never played they can't beat us.
INNER JOIN performances other_performances ON (
other_team_person.person_id = other_performances.person_id
AND other_team_person.team_id = other_performances.team_id)
-- Make sure `my_team_person` is only teams we're a member of
WHERE my_team_person.person_id = $1
-- Also discard rows where the other person didn't do better than us
AND my_performances.score < other_performances.score
-- Emit one row per team we're a member of
GROUP BY my_performances.team_id;
If you want to show teams where you never played and teams where you're the only player, you'll need to change some INNER JOINs to LEFT OUTER JOINs.
If you want to compare to find people who beat you only within a given game, you're going to need an extra column on performances, then an extra term in the join on other_performances to restrict it to only matching in the same game as my_performances.

SQL Trying to Outer Join Tables but not working

So, I've a Uni assignment and the lecturer has picked this week to be ill and unable to answer questions.
We've been given a baseball database made up of 4 tables to work with. Table structures are as follows:
TABLENAME:(column1, column2...etc) PK = Primary Key, FK = Foreign Key
PLAYER:(num PK, name, dob, team FK, position)
GAME:(num PK, gamedate, hometeam FK, awayteam FK, homescore,
awayscore)
GAMESTAT:(gamenum PK, playernum FK, homeruns, strikeout)
TEAM:(code PK, name, town, ground)
The aim of this particularly question is to obtain the name of the stadium's (ground in team table), the sum of the home runs scored on that ground, the sum of the strikeouts and then the sum of these two values within a specified date range.
My query and issue are below:
SELECT
t.ground AS GROUNDPLAYED,
SUM(gs.homeruns) as TOTALHOMERUNS,
SUM(gs.strikeouts) AS TOTALSTRIKEOUTS,
SUM(gs.homeruns + gs.strikeouts) AS COMBINEDTOTAL
FROM team t
LEFT OUTER JOIN game g ON g.hometeam = t.code
LEFT OUTER JOIN gamestat gs ON g.num = gs.gamenum
WHERE g.gamedate BETWEEN '7-AUG-2014' AND '13-AUG-2014'
GROUP BY t.ground;
My problem lies in the fact that I get the correct values for games played but regardless of using the LEFT OUTER JOIN, I'm not getting all the stadium's to list. I'm convinced it has to do with the fact that I have had to join to the hometeam from the GAME table and it can only pick the home stadiums based on that.
Any help you may be able to offer would be much appreciated.
Move your WHERE clause to the ON clause for the join to gamestat.
By imposing the filter criteria in the WHERE clause, it occurs after the join has been performed, removing the stadiums with no activity. Once this predicate is moved to the appropriate ON clause it will filter the gamestat's before the join instead of after.
You have experienced the good fortune to encounter this important quirk of SQL, that the positioning of predicates affects the result-set, early in your education.

Unexpected results after joining another table

I use three tables to get to the final result. They are called project_board_members, users and project_team.
This is the query:
SELECT `project_board_members`.`member_id`,
`users`.`name`,
`users`.`surname`,
`users`.`country`,
`project_team`.`tasks_completed`
FROM `project_board_members`
JOIN `users`
ON (`users`.`id` = `project_board_members`.`member_id`)
JOIN `project_team`
ON (`project_team`.`user_id` = `project_board_members`.`member_id`)
WHERE `project_board_members`.`project_id` = '5'
You can ignore last line because it just points to the project I'm using.
Table project_board_members holds three entries and have structure like:
id,
member_id,
project_id,
created_at;
I need to get member_id from that table. Then I join to users table to get name, surname and country. No problems. All works! :)
After that, I needed to get tasks_completed for each user. That is stored in project_team table. The big unexpected thing is that I got four entries returned and the big what-the-f*ck is that in the project_board_members table are only three entries.
Why is that so? Thanks in advice!
A SQL join creates a result set that contains one row for each combination of the left and right tables that matches the join conditions. Without seeing the data or a little more information it's hard to say what exactly is wrong from what you expect, but I'm guessing it's one of the following:
1) You have two entries in project_team with the same user_id.
2) Your entries in project_team store both user_id and project_id and you need to be joining on both of them rather than just user_id.
The table project_board_members represent what is called in the Entity-Relationship modelling world an "associative entity". It exists to implement a many-to-many relationship (in this case, between the project and user entities. As such it is a dependent entity, which is to say that the existence of an instance of it is predicated on the existence of an instance of each of the entities to which it refers (a user and a project).
As a result, the columnns comprising the foreign keys relating to those entities (member_id and project_id) must be form part or all of the primary key.
Normally, instances of an associative entity are unique WRT the entities to which it relates. In your case the relationship definitions would be:
Each user is seated on the board of 0-to-many projects;
Each project's board is comprise of 0-to-many users
which is to say that a particular user may not be on the board of a particular project more than once. The only reason for adding other columns (such as your id column) to the primary key would be if the user:project relationship is non-unique.
To enforce this rule -- a user may sit on the board a particular project just once -- the table schema should look like this:
create table project_board_member
(
member_id int not null foreign key references user ( user_id ) ,
project_Id int not null foreign key references project ( project_id ) ,
created_at ...
...
primary key ( member_id , project_id ) ,
)
}
The id column is superfluous.
For debugging purposes do
SELECT GROUP_CONCAT(pbm.member_id) AS member_ids,
GROUP_CONCAT(u.name) as names,
GROUP_CONCAT(u.surname) as surnames,
GROUP_CONCAT(u.country) as countries,
GROUP_CONCAT(pt.tasks_completed) as tasks
FROM project_board_members pbm
JOIN users u
ON (u.id = pbm.member_id)
JOIN project_team pt
ON (pt.user_id = pbm.member_id)
WHERE pbm.project_id = '5'
GROUP BY pbm.member_id
All the fields that list multiple entries in the result are messing up the rowcount in your resultset.
To Fix that you can do:
SELECT pbm.member_id
u.name,
u.surname,
u.country,
pt.tasks_completed
FROM (SELECT
p.project_id, p.member_id
FROM project_board_members p
WHERE p.project_id = '5'
LIMIT 1
) AS pbm
JOIN users u
ON (u.id = pbm.member_id)
JOIN project_team pt
ON (pt.user_id = pbm.member_id)

SQL Join 2 tables

I have two tables one named Person, which contains columns ID and Name and the second one, named Relation, which contains two columns, each of which contains an ID of a Person. It's about a relation between customer and serviceman. I'd like to Join these two tables so that I'll have names of people in every relation. Is it possible to write this query with some kind of joining?
EDIT::
I must do something wrong, but it's not working. I had tried a lot of forms of so looking queries, but I had been only getting one column or some errors. It's actually the school task, I have it already done (with different JOIN query). Firstly I had been trying to do this, but I'd failed: It seems to be very common situation, so I don't know why it's too complicated for me..
Here are my tables:
CREATE TABLE Oprava..(Repair) (
KodPodvozku INTEGER PRIMARY KEY REFERENCES Automobil(KodPodvozku),
IDzakaznika..(IDcustomer) INTEGER REFERENCES Osoba(ID),
IDzamestnance..(IDemployee) INTEGER REFERENCES Osoba(ID)
);
CREATE TABLE Osoba..(Person) (
ID INTEGER CONSTRAINT primaryKeyOsoba PRIMARY KEY ,
Jmeno..(Name) VARCHAR(256) NOT NULL,
OP INTEGER UNIQUE NOT NULL
);
It's in Czech, but the words in brackets after ".." are english equivalents.
PS: I am using Oracle SQL.
Assuming your tables are:
persons: (id, name)
relations: (customer_id, serviceman_id)
Using standard SQL:
SELECT p1.name AS customer_name,
p2.name AS serviceman_name
FROM persons p1
JOIN relations ON p1.id=relations.customer_id
JOIN persons p2 ON relations.serviceman_d=p2.id;
Further explanation
The join creates the following table:
p1.id|p1.name|relations.customer_id|relations.serviceman_id|p2.id|p2.name
Where p1.id=relations.customer_id, and p2.id=relations.serviceman_id. The SELECT clause chooses only the names from the JOIN.
Note that if all the ids from relations are also in persons, the result size would be exactly the size of the relations table. You might want to add a foreign key to verify that.
SELECT *
FROM Relation
INNER JOIN Person P1
ON P1.ID = Relation.FirstPersonID
INNER JOIN Person P2
ON P2.ID = Relation.SecondPersonID
SELECT p1.name AS customer, p2.name AS serciveman
FROM person p1, person p2, relation
WHERE p1.id=relation.customerid AND p2.id=relation.servicemanid
Person(ID, Name)
Relation(ID)
You don't mention the other columns that relation contains but this is what you need:
Select name
from Person as p
join Relation as r
on p.ID = r.ID
This is an INNER JOIN as are most of the other answers here. Please don't use this until you understand that if either record doesn't have a relationship in the other table it will be missing from the dataset (i.e. you can lose data)
Its very important to understand the different types of join so I would use this as an opportunity.