What is the necessity of junction tables? - sql

I have implemented the following ways of storing relational topology:
1.A general junction relation table:
Table: Relation
Columns: id parent_type parent_id parent_prop child_type child_id child_prop
On which joins are not generally capable of being executed against by most sql engines.
2.Relation specific junction tables
Table: Class2Student
Columns: id parent_id parent_prop child_id child_prop
On which joins are capable of being executed against.
3.Storing lists/string maps of related objects in a text field on both bidirectional objects.
Class: Class
Class properties: id name students
Table columns: id name students_keys
Rows: 1 "history" [{type:Basic_student,id:1},{type:Advanced_student,id:3}]
To enable joins by the sql engines, it would be possible to write a custom module which would be made even easier if the contents of students_keys was simply [1,3], ie that a relation was to the explicit Student type.
The questions are the following in the context of:
I fail to see what the point of a junction table is. For example, I fail to see that any problems the following arguments for a junction table claim to relieve, actually exist:
Inability to logically correctly save a bidirectional relations (eg
there is no data orphaning in bidirectional relations or any
relations with a keys field, because one recursively saves and one can enforce
other operations (delete,update) quite easily)
Inability to join effectively
I am not soliciting opinions on your personal opinions on best practices or any cult-like statements on normalization.
The explicit question(s) are the following:
What are the instances where one would want to query a junction table that is not provided by querying a owning object's keys field?
What are logical implementation problems in the context of computation provided by the sql engine where the junction table is preferable?
The only implementation difference with regards to a junction table vs a keys fields is the following:
When searching for a query of the following nature you would need to match against the keys field with either a custom indexing implementation or some other reasonable implementation:
class_dao.search({students:advanced_student_3,name:"history"});
search for Classes that have a particular student and name "history"
As opposed to searching the indexed columns of the junction table and then selecting the approriate Classes.
I have been unable to identify answers why a junction table is logically preferable for quite literally any reason. I am not claiming this is the case or do I have a religious preference one way or another as evidenced by the fact that I implemented multiple ways of achieving this. My problem is I do not know what they are.

The way I see it, you have have several entities
CREATE TABLE StudentType
(
Id Int PRIMARY KEY,
Name NVarChar(50)
);
INSERT StudentType VALUES
(
(1, 'Basic'),
(2, 'Advanced'),
(3, 'SomeOtherCategory')
);
CREATE TABLE Student
(
Id Int PRIMARY KEY,
Name NVarChar(200),
OtherAttributeCommonToAllStudents Int,
Type Int,
CONSTRAINT FK_Student_StudentType
FOREIGN KEY (Type) REFERENCES StudentType(Id)
)
CREATE TABLE StudentAdvanced
(
Id Int PRIMARY KEY,
AdvancedOnlyAttribute Int,
CONSTRIANT FK_StudentAdvanced_Student
FOREIGN KEY (Id) REFERENCES Student(Id)
)
CREATE TABLE StudentSomeOtherCategory
(
Id Int PRIMARY KEY,
SomeOtherCategoryOnlyAttribute Int,
CONSTRIANT FK_StudentSomeOtherCategory_Student
FOREIGN KEY (Id) REFERENCES Student(Id)
)
Any attributes that are common to all students have columns on the Student table.
Types of student that have extra attributes are added to the StudentType table.
Each extra student type gets a Student<TypeName> table to store its specific attributes. These tables have an optional one-to-one relationship with Student.
I think that your "straw-man" junction table is a partial implementation of an EAV anti-pattern, the only time this is sensible, is when you can't know what attributes you need to model, i.e. your data will be entirely unstructured. When this is a real requirment, relational databases start to look less desirable. On those occasions consider a NOSQL/Document database alternative.
A junction table would be useful in the following scenario.
Say we add a Class entity to the model.
CREATE TABLE Class
(
Id Int PRIMARY KEY,
...
)
Its concievable that we would like to store the many-to-many realtionship between students and classes.
CREATE TABLE Registration
(
Id Int PRIMARY KEY,
StudentId Int,
ClassId Int,
CONSTRAINT FK_Registration_Student
FOREIGN KEY (StudentId) REFERENCES Student(Id),
CONSTRAINT FK_Registration_Class
FOREIGN KEY (ClassId) REFERENCES Class(Id)
)
This entity would be the right place to store attributes that relate specifically to a student's registration to a class, perhaps a completion flag for instance. Other data would naturally relate to this junction, pehaps a class specific attendance record or a grade history.
If you don't relate Class and Student in this way, how would you select both, all the students in a class, and all the classes a student reads. Performance wise, this is easily optimised by indices on key columns.
When a many-to-many realtionships exists without any attributes I agree that logically, the junction table needn't exist. However, in a relational database, junction tables are still a useful physical implmentaion, perhaps like this,
CREATE TABLE StudentClass
(
StudentId Int,
ClassId Int,
CONSTRAINT PK_StudentClass PRIMARY KEY (ClassId, StudentId),
CONSTRAINT FK_Registration_Student
FOREIGN KEY (StudentId) REFERENCES Student(Id),
CONSTRAINT FK_Registration_Class
FOREIGN KEY (ClassId) REFERENCES Class(Id)
)
this allows simple queries like
// students in a class?
SELECT StudentId
FROM StudentClass
WHERE ClassId = #classId
// classes read by a student?
SELECT ClassId
FROM StudentClass
WHERE StudentId = #studentId
additionaly, this enables a simple way to manage the relationship, partially or completely from either aspect, that will be familar to relational database developers and sargeable by query optimisers.

Related

SQL Server use same Guid as primary key in 2 tables

We have 2 tables with a 1:1 relationship.
1 table should reference the other, typically one would use a FK relationship.
Since there is a 1:1 relationship, we could also directly use the same Guid in both tables as primary key.
Additional info: the data is split into 2 tables since the data is rather separate, think "person" and "address" - but in a world where there is a clear 1:1 relationship between the 2.
As per the tags I was suggested I assume this is called "shared primary key".
Would using the same Guid as PK in 2 tables have any ill effects?
To consolidate info from comments into answer...
No, there are no ill effects of two tables sharing PK.
You will still need to create a FK reference from 2nd table, FK column will be the same as PK column.
Though, your example of "Person" and "Address" in 1:1 situation is not best suited. Common usage of this practice is entities that extend one another. For example: Table "User" can hold common info on all users, but tables "Candidate" and "Recruiter" can each expand on it, and all tables can share same PK. Programming language representation would also be classes that extends one another.
Other (similar) example would be table that store more detailed info than the base table like "User" and "UserDetails". It's 1:1 and no need to introduce additional PK column.
Code sample where PK is also a FK:
CREATE TABLE [User]
(
id INT PRIMARY KEY
, name NVARCHAR(100)
);
CREATE TABLE [Candidate]
(
id INT PRIMARY KEY FOREIGN KEY REFERENCES [User](id)
, actively_looking BIT
);
CREATE TABLE [Recruiter]
(
id INT PRIMARY KEY
, currently_hiring BIT
, FOREIGN KEY (id) REFERENCES [User](id)
);
PS: As mentioned GUID is not best suited column for PK due to performance issues, but that's another topic.

query with SQL m:n relationships

I have a quick question with respect to many to many relationships in sql.
So theoretically i understand that if 2 entities in an ER model have a M:N relationship between them, we have to split that into 2 1:N relationships with the inclusion of an intersection/lookup table which has a composite primary key from both the parent tables. But, my question here is , in addition to the composite primary key, can there be any other extra column added to the composite table which are not in any of the 2 parent tables ? (apart from intersectionTableId, table1ID, table2ID) a 4rth column which is entirely new and not in any of the 2 parent tables ? Please let me know.
In a word - yes. It's a common practice to denote properties of the relationship between the two entities.
E.g., consider you have a database storing the details of people and the sports teams they like:
CREATE TABLE person (
id INT PRIMARY KEY,
first_name VARCHAR(10),
last_name VARCHAR(10)
);
CREATE TABLE team (
id INT PRIMARY KEY,
name VARCHAR(10)
);
A person may like more than one team, which is your classic M:N relationship table. But, you could also add some details to this entity, such as when did a person start liking a team:
CREATE TABLE fandom (
person_id INT NOT NULL REFERENCES person(id),
team_id INT NOT NULL REFERENCES team(id),
fandom_started DATE,
PRIMARY KEY (person_id, team_id)
);
Yes, you can do that by modeling the "relationship" table yourself explicitly (just like your other entities).
Here are some posts about exactly that question.
Create code first, many to many, with additional fields in association table
Entity Framework CodeFirst many to many relationship with additional information

Many to many table design question

Originally I had two tables in my DB, [Property] and [Employee].
Each employee can have one "Home Property" so the employee table has a HomePropertyID FK field to Property.
Later I needed to model the situation where despite having only one "Home Property" the employee did work at or cover for multiple properties.
So I created an [Employee2Property] table that has EmployeeID and PropertyID FK fields to model this many-to-many relationship.
Now I find that I need to create other many-to-many relationships between employees and properties. For example if there are multiple employees that are managers for a property or multiple employees that perform maintenance work at a property, etc.
My questions are:
Should I create separate many-to-many tables for each of these situations or should I just create one more table like [PropertyAssociatonType] that lists the types of associations an employee can have with a property and just add a FK field to [Employee2Property] such as PropertyAssociationTypeID that explains what the association is? I'm curious about the pros/cons or if there's another better way.
Am I stupid and going about this all wrong?
Thanks for any suggestions :)
This is a very valid question. And the answer is: it depends
The following things suggest using a single 'typed' M:N relationship:
you often want to process all employee-property relationships, independent of type
the number of associations is changing all the time, i.e. new types get invented.
a employee property relationship sometimes changes its type.
If these statements are more wrong then right, you might better be of using separate relationships.
Create Table Employee
(
Id int not null Primary Key
, ....
)
Create Table Property
(
Id int not null Primary Key
, ....
)
Create Table Role
(
Name varchar(10) not null Primary Key
, ....
)
Into the roles table you would put things like "Manager" etc.
Create Table PropertyEmployeeRoles
(
PropertyId int not null
, EmployeeId int not null
, RoleName varchar(10) not null
, Constraint FK_PropertyEmployeeRoles_Properties
Foreign Key( PropertyId )
References dbo.Properties( Id )
, Constraint FK_PropertyEmployeeRoles_Employees
Foreign Key( EmployeeId )
References dbo.Employees( Id )
, Constraint FK_PropertyEmployeeRoles_Roles
Foreign Key( RoleName )
References dbo.Roles( Name )
, Constraint UK_PropertyEmployeeRoles Unique ( PropertyId, EmployeeId, RoleName )
)
In this way, the same employee could serve multiple roles on the same property. This structure would not work for situations where you needed to guarantee that there was one and only of some item (e.g. HomeProperty) but would allow you to expand the list of roles that an employee could have with respect to a given property.
Two considerations should guide your choice.
How static will the list of potential types of associations be? You are already having to add new ones, so it seems that the answer here may be - not very. If there is likelihood that this list will grow, stick with choice B (one table with an additonal FK to an AssociationType table)
What are the access patterns for this data likely to be? If there will be a need to access the individual types of associations, isolated from all the other types, then multiple tables might be better. But I would only do this is the list of associatipon types was also very static

If I have two tables in SQL with a many-many relationship, do I need to create an additional table?

Take these tables for example.
Item
id
description
category
Category
id
description
An item can belong to many categories and a category obviously can be attached to many items.
How would the database be created in this situation? I'm not sure. Someone said create a third table, but do I need to do that? Do I literally do a
create table bla bla
for the third table?
Yes, you need to create a third table with mappings of ids, something with columns like:
item_id (Foreign Key)
category_id (Foreign Key)
edit: you can treat item_id and category_id as a primary key, they uniquely identify the record alone. In some applications I've found it useful to include an additional numeric identifier for the record itself, and you might optionally include one if you're so inclined
Think of this table as a listing of all the mappings between Items and Categories. It's concise, and it's easy to query against.
edit: removed (unnecessary) primary key.
Yes, you cannot form a third-normal-form many-to-many relationship between two tables with just those two tables. You can form a one-to-many (in one of the two directions) but in order to get a true many-to-many, you need something like:
Item
id primary key
description
Category
id primary key
description
ItemCategory
itemid foreign key references Item(id)
categoryid foreign key references Category(id)
You do not need a category in the Item table unless you have some privileged category for an item which doesn't seem to be the case here. I'm also not a big fan of introducing unnecessary primary keys when there is already a "real" unique key on the joining table. The fact that the item and category IDs are already unique means that the entire record for the ItemCategory table will be unique as well.
Simply monitor the performance of the ItemCategory table using your standard tools. You may require an index on one or more of:
itemid
categoryid
(itemid,categoryid)
(categoryid,itemid)
depending on the queries you use to join the data (and one of the composite indexes would be the primary key).
The actual syntax for the entire job would be along the lines of:
create table Item (
id integer not null primary key,
description varchar(50)
);
create table Category (
id integer not null primary key,
description varchar(50)
);
create table ItemCategory (
itemid integer references Item(id),
categoryid integer references Category(id),
primary key (itemid,categoryid)
);
There's other sorts of things you should consider, such as making your ID columns into identity/autoincrement columns, but that's not directly relevant to the question at hand.
Yes, you need a "join table". In a one-to-many relationship, objects on the "many" side can have an FK reference to objects on the "one" side, and this is sufficient to determine the entire relationship, since each of the "many" objects can only have a single "one" object.
In a many-to-many relationship, this is no longer sufficient because you can't stuff multiple FK references in a single field. (Well, you could, but then you would lose atomicity of data and all of the nice things that come with a relational database).
This is where a join table comes in - for every relationship between an Item and a Category, the relation is represented in the join table as a pair: Item.id x Category.id.

What's the proper name for an "add-on" table?

I have a table a with primary key id and a table b that represents a specialized version of a (it has all the same characteristics to track as a does, plus some specific to its b-ness--the latter are all that are stored in b). If I decide to represent this by having b's primary key be also a foreign key to a.id, what's the proper terminology for b in relation to a?
A real world example might be a person table with student and teacher add-on tables. A student might also be a teacher (a TA for example) but they're both the same person.
I would call it a 'child table' of a but I already use that as a synonym for 'detail table', like lines on a purchase order, for example.
Your design sounds like Concrete Table Inheritance.
I'd call table B a concrete table that extends table A.
The relationship is one-to-one.
Other answers have suggested storing only the columns specific to the extended table. This design would be called Class Table Inheritance.
Ok this is sort of off topic but first things first, why does B have all of A's columns? It should only have the added columns, ESPECIALLY if you are referencing A with a foriegn key.
"Add on" records are usually called "Detail(s)"
For example, lets say my Table A is "Cars" my Table B would be "CarDetails"
As Neil N said, you shouldn't have the columns in both places if you're referencing table A in table B through a foreign key.
What your describing sounds a bit like a parallel to inheritance in object oriented programming. Personally, I don't use any specific naming convention in this case. I name A what it is and I name B what it is. For example, I might have:
CREATE TABLE People
(
people_id INT NOT NULL,
first_name VARCHAR(40) NOT NULL,
last_name VARCHAR(40) NOT NULL,
...
CONSTRAINT PK_People PRIMARY KEY CLUSTERED (people_id)
)
GO
CREATE TABLE My_Application_Users
(
people_id INT NOT NULL,
user_name VARCHAR(20) NOT NULL,
security_level INT NOT NULL,
CONSTRAINT PK_My_Application_Users PRIMARY KEY CLUSTERED (people_id),
CONSTRAINT UI_My_Application_Users_user_name UNIQUE (user_name)
)
GO
This is just an example, so please don't tell me that my name columns are too long or too short or that they should allow NULLs, etc. ;)
what's the proper terminology for b in relation to a?
Table B is a child of Table A (the parent), because in order for a record to exist in the child, it must first exist in the parent.
Tables should be modeled based on either having one-to-many or many-to-one relationships depending on the context, and of those options they can be either optional or required. Tables that link two sets of lists together will relate to other tables in a many-to-one fashion for every table involved. For example, users, groups, and user_groups_xref - the user_groups_xref can support numerous specific user instances of a user records, and the same relationship to the groups table.
There's no point in one-to-one relationships - these should never be allowed to exist because it should only be one table.