I have a quick question with respect to many to many relationships in sql.
So theoretically i understand that if 2 entities in an ER model have a M:N relationship between them, we have to split that into 2 1:N relationships with the inclusion of an intersection/lookup table which has a composite primary key from both the parent tables. But, my question here is , in addition to the composite primary key, can there be any other extra column added to the composite table which are not in any of the 2 parent tables ? (apart from intersectionTableId, table1ID, table2ID) a 4rth column which is entirely new and not in any of the 2 parent tables ? Please let me know.
In a word - yes. It's a common practice to denote properties of the relationship between the two entities.
E.g., consider you have a database storing the details of people and the sports teams they like:
CREATE TABLE person (
id INT PRIMARY KEY,
first_name VARCHAR(10),
last_name VARCHAR(10)
);
CREATE TABLE team (
id INT PRIMARY KEY,
name VARCHAR(10)
);
A person may like more than one team, which is your classic M:N relationship table. But, you could also add some details to this entity, such as when did a person start liking a team:
CREATE TABLE fandom (
person_id INT NOT NULL REFERENCES person(id),
team_id INT NOT NULL REFERENCES team(id),
fandom_started DATE,
PRIMARY KEY (person_id, team_id)
);
Yes, you can do that by modeling the "relationship" table yourself explicitly (just like your other entities).
Here are some posts about exactly that question.
Create code first, many to many, with additional fields in association table
Entity Framework CodeFirst many to many relationship with additional information
Related
We have 2 tables with a 1:1 relationship.
1 table should reference the other, typically one would use a FK relationship.
Since there is a 1:1 relationship, we could also directly use the same Guid in both tables as primary key.
Additional info: the data is split into 2 tables since the data is rather separate, think "person" and "address" - but in a world where there is a clear 1:1 relationship between the 2.
As per the tags I was suggested I assume this is called "shared primary key".
Would using the same Guid as PK in 2 tables have any ill effects?
To consolidate info from comments into answer...
No, there are no ill effects of two tables sharing PK.
You will still need to create a FK reference from 2nd table, FK column will be the same as PK column.
Though, your example of "Person" and "Address" in 1:1 situation is not best suited. Common usage of this practice is entities that extend one another. For example: Table "User" can hold common info on all users, but tables "Candidate" and "Recruiter" can each expand on it, and all tables can share same PK. Programming language representation would also be classes that extends one another.
Other (similar) example would be table that store more detailed info than the base table like "User" and "UserDetails". It's 1:1 and no need to introduce additional PK column.
Code sample where PK is also a FK:
CREATE TABLE [User]
(
id INT PRIMARY KEY
, name NVARCHAR(100)
);
CREATE TABLE [Candidate]
(
id INT PRIMARY KEY FOREIGN KEY REFERENCES [User](id)
, actively_looking BIT
);
CREATE TABLE [Recruiter]
(
id INT PRIMARY KEY
, currently_hiring BIT
, FOREIGN KEY (id) REFERENCES [User](id)
);
PS: As mentioned GUID is not best suited column for PK due to performance issues, but that's another topic.
I have implemented the following ways of storing relational topology:
1.A general junction relation table:
Table: Relation
Columns: id parent_type parent_id parent_prop child_type child_id child_prop
On which joins are not generally capable of being executed against by most sql engines.
2.Relation specific junction tables
Table: Class2Student
Columns: id parent_id parent_prop child_id child_prop
On which joins are capable of being executed against.
3.Storing lists/string maps of related objects in a text field on both bidirectional objects.
Class: Class
Class properties: id name students
Table columns: id name students_keys
Rows: 1 "history" [{type:Basic_student,id:1},{type:Advanced_student,id:3}]
To enable joins by the sql engines, it would be possible to write a custom module which would be made even easier if the contents of students_keys was simply [1,3], ie that a relation was to the explicit Student type.
The questions are the following in the context of:
I fail to see what the point of a junction table is. For example, I fail to see that any problems the following arguments for a junction table claim to relieve, actually exist:
Inability to logically correctly save a bidirectional relations (eg
there is no data orphaning in bidirectional relations or any
relations with a keys field, because one recursively saves and one can enforce
other operations (delete,update) quite easily)
Inability to join effectively
I am not soliciting opinions on your personal opinions on best practices or any cult-like statements on normalization.
The explicit question(s) are the following:
What are the instances where one would want to query a junction table that is not provided by querying a owning object's keys field?
What are logical implementation problems in the context of computation provided by the sql engine where the junction table is preferable?
The only implementation difference with regards to a junction table vs a keys fields is the following:
When searching for a query of the following nature you would need to match against the keys field with either a custom indexing implementation or some other reasonable implementation:
class_dao.search({students:advanced_student_3,name:"history"});
search for Classes that have a particular student and name "history"
As opposed to searching the indexed columns of the junction table and then selecting the approriate Classes.
I have been unable to identify answers why a junction table is logically preferable for quite literally any reason. I am not claiming this is the case or do I have a religious preference one way or another as evidenced by the fact that I implemented multiple ways of achieving this. My problem is I do not know what they are.
The way I see it, you have have several entities
CREATE TABLE StudentType
(
Id Int PRIMARY KEY,
Name NVarChar(50)
);
INSERT StudentType VALUES
(
(1, 'Basic'),
(2, 'Advanced'),
(3, 'SomeOtherCategory')
);
CREATE TABLE Student
(
Id Int PRIMARY KEY,
Name NVarChar(200),
OtherAttributeCommonToAllStudents Int,
Type Int,
CONSTRAINT FK_Student_StudentType
FOREIGN KEY (Type) REFERENCES StudentType(Id)
)
CREATE TABLE StudentAdvanced
(
Id Int PRIMARY KEY,
AdvancedOnlyAttribute Int,
CONSTRIANT FK_StudentAdvanced_Student
FOREIGN KEY (Id) REFERENCES Student(Id)
)
CREATE TABLE StudentSomeOtherCategory
(
Id Int PRIMARY KEY,
SomeOtherCategoryOnlyAttribute Int,
CONSTRIANT FK_StudentSomeOtherCategory_Student
FOREIGN KEY (Id) REFERENCES Student(Id)
)
Any attributes that are common to all students have columns on the Student table.
Types of student that have extra attributes are added to the StudentType table.
Each extra student type gets a Student<TypeName> table to store its specific attributes. These tables have an optional one-to-one relationship with Student.
I think that your "straw-man" junction table is a partial implementation of an EAV anti-pattern, the only time this is sensible, is when you can't know what attributes you need to model, i.e. your data will be entirely unstructured. When this is a real requirment, relational databases start to look less desirable. On those occasions consider a NOSQL/Document database alternative.
A junction table would be useful in the following scenario.
Say we add a Class entity to the model.
CREATE TABLE Class
(
Id Int PRIMARY KEY,
...
)
Its concievable that we would like to store the many-to-many realtionship between students and classes.
CREATE TABLE Registration
(
Id Int PRIMARY KEY,
StudentId Int,
ClassId Int,
CONSTRAINT FK_Registration_Student
FOREIGN KEY (StudentId) REFERENCES Student(Id),
CONSTRAINT FK_Registration_Class
FOREIGN KEY (ClassId) REFERENCES Class(Id)
)
This entity would be the right place to store attributes that relate specifically to a student's registration to a class, perhaps a completion flag for instance. Other data would naturally relate to this junction, pehaps a class specific attendance record or a grade history.
If you don't relate Class and Student in this way, how would you select both, all the students in a class, and all the classes a student reads. Performance wise, this is easily optimised by indices on key columns.
When a many-to-many realtionships exists without any attributes I agree that logically, the junction table needn't exist. However, in a relational database, junction tables are still a useful physical implmentaion, perhaps like this,
CREATE TABLE StudentClass
(
StudentId Int,
ClassId Int,
CONSTRAINT PK_StudentClass PRIMARY KEY (ClassId, StudentId),
CONSTRAINT FK_Registration_Student
FOREIGN KEY (StudentId) REFERENCES Student(Id),
CONSTRAINT FK_Registration_Class
FOREIGN KEY (ClassId) REFERENCES Class(Id)
)
this allows simple queries like
// students in a class?
SELECT StudentId
FROM StudentClass
WHERE ClassId = #classId
// classes read by a student?
SELECT ClassId
FROM StudentClass
WHERE StudentId = #studentId
additionaly, this enables a simple way to manage the relationship, partially or completely from either aspect, that will be familar to relational database developers and sargeable by query optimisers.
I have used column-based relations a lot in my projects like:
CREATE TABLE `user` (
id INT AUTO_INCREMENT PRIMARY KEY,
usergroup INT
);
CREATE TABLE `usergroup` (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50)
);
however, at work it seems some people do it using table-based relations like this:
CREATE TABLE `user` (
id INT AUTO_INCREMENT PRIMARY KEY
);
CREATE TABLE `usergrouprelation` (
userid INT,
usergroupdid INT
);
CREATE TABLE `usergroup` (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50)
);
and I am wondering here what are the pros and cons of both approach? And what is the official term for this?
The relationships are different.
In your first example, a one to many relationship. One group can have many users. (a user can only be in one group)
In your second example, a many to many relationship. Many groups can have many users. (a user can be in more than one group and groups can have more than one user)
That's the difference between the two, it's common practice to use an intermediate table to break up a many to many relationship.
If you have many-to-many relationship you must go second way.
But if you have one-to-many or many-to-one relationship you can choose any of two variants (but the second is more expandable).
See more: Database relationships
There are no general pros or cons about that. I'd call your "column based relationship" a 1:n relation and your "table based relationship" a n:m relation.
1:n means every user can be related to zero or 1 user group and each user group can be related to many users.
n:m means, every user can be related to zero to many user groups and vice versa.
Originally I had two tables in my DB, [Property] and [Employee].
Each employee can have one "Home Property" so the employee table has a HomePropertyID FK field to Property.
Later I needed to model the situation where despite having only one "Home Property" the employee did work at or cover for multiple properties.
So I created an [Employee2Property] table that has EmployeeID and PropertyID FK fields to model this many-to-many relationship.
Now I find that I need to create other many-to-many relationships between employees and properties. For example if there are multiple employees that are managers for a property or multiple employees that perform maintenance work at a property, etc.
My questions are:
Should I create separate many-to-many tables for each of these situations or should I just create one more table like [PropertyAssociatonType] that lists the types of associations an employee can have with a property and just add a FK field to [Employee2Property] such as PropertyAssociationTypeID that explains what the association is? I'm curious about the pros/cons or if there's another better way.
Am I stupid and going about this all wrong?
Thanks for any suggestions :)
This is a very valid question. And the answer is: it depends
The following things suggest using a single 'typed' M:N relationship:
you often want to process all employee-property relationships, independent of type
the number of associations is changing all the time, i.e. new types get invented.
a employee property relationship sometimes changes its type.
If these statements are more wrong then right, you might better be of using separate relationships.
Create Table Employee
(
Id int not null Primary Key
, ....
)
Create Table Property
(
Id int not null Primary Key
, ....
)
Create Table Role
(
Name varchar(10) not null Primary Key
, ....
)
Into the roles table you would put things like "Manager" etc.
Create Table PropertyEmployeeRoles
(
PropertyId int not null
, EmployeeId int not null
, RoleName varchar(10) not null
, Constraint FK_PropertyEmployeeRoles_Properties
Foreign Key( PropertyId )
References dbo.Properties( Id )
, Constraint FK_PropertyEmployeeRoles_Employees
Foreign Key( EmployeeId )
References dbo.Employees( Id )
, Constraint FK_PropertyEmployeeRoles_Roles
Foreign Key( RoleName )
References dbo.Roles( Name )
, Constraint UK_PropertyEmployeeRoles Unique ( PropertyId, EmployeeId, RoleName )
)
In this way, the same employee could serve multiple roles on the same property. This structure would not work for situations where you needed to guarantee that there was one and only of some item (e.g. HomeProperty) but would allow you to expand the list of roles that an employee could have with respect to a given property.
Two considerations should guide your choice.
How static will the list of potential types of associations be? You are already having to add new ones, so it seems that the answer here may be - not very. If there is likelihood that this list will grow, stick with choice B (one table with an additonal FK to an AssociationType table)
What are the access patterns for this data likely to be? If there will be a need to access the individual types of associations, isolated from all the other types, then multiple tables might be better. But I would only do this is the list of associatipon types was also very static
I have a table a with primary key id and a table b that represents a specialized version of a (it has all the same characteristics to track as a does, plus some specific to its b-ness--the latter are all that are stored in b). If I decide to represent this by having b's primary key be also a foreign key to a.id, what's the proper terminology for b in relation to a?
A real world example might be a person table with student and teacher add-on tables. A student might also be a teacher (a TA for example) but they're both the same person.
I would call it a 'child table' of a but I already use that as a synonym for 'detail table', like lines on a purchase order, for example.
Your design sounds like Concrete Table Inheritance.
I'd call table B a concrete table that extends table A.
The relationship is one-to-one.
Other answers have suggested storing only the columns specific to the extended table. This design would be called Class Table Inheritance.
Ok this is sort of off topic but first things first, why does B have all of A's columns? It should only have the added columns, ESPECIALLY if you are referencing A with a foriegn key.
"Add on" records are usually called "Detail(s)"
For example, lets say my Table A is "Cars" my Table B would be "CarDetails"
As Neil N said, you shouldn't have the columns in both places if you're referencing table A in table B through a foreign key.
What your describing sounds a bit like a parallel to inheritance in object oriented programming. Personally, I don't use any specific naming convention in this case. I name A what it is and I name B what it is. For example, I might have:
CREATE TABLE People
(
people_id INT NOT NULL,
first_name VARCHAR(40) NOT NULL,
last_name VARCHAR(40) NOT NULL,
...
CONSTRAINT PK_People PRIMARY KEY CLUSTERED (people_id)
)
GO
CREATE TABLE My_Application_Users
(
people_id INT NOT NULL,
user_name VARCHAR(20) NOT NULL,
security_level INT NOT NULL,
CONSTRAINT PK_My_Application_Users PRIMARY KEY CLUSTERED (people_id),
CONSTRAINT UI_My_Application_Users_user_name UNIQUE (user_name)
)
GO
This is just an example, so please don't tell me that my name columns are too long or too short or that they should allow NULLs, etc. ;)
what's the proper terminology for b in relation to a?
Table B is a child of Table A (the parent), because in order for a record to exist in the child, it must first exist in the parent.
Tables should be modeled based on either having one-to-many or many-to-one relationships depending on the context, and of those options they can be either optional or required. Tables that link two sets of lists together will relate to other tables in a many-to-one fashion for every table involved. For example, users, groups, and user_groups_xref - the user_groups_xref can support numerous specific user instances of a user records, and the same relationship to the groups table.
There's no point in one-to-one relationships - these should never be allowed to exist because it should only be one table.