SQL server entity reference between tables - sql

I am new to SQL server and I am trying to learn the power of SQL :)
I need help with understanding how different tables can interact with one and other within the same database. I know that an entity can have a foreign key with a reference to an attribute of a different table. But how can I have one table with a reference to many different tables?
Lets take an example. I have a simple database with a Car table, Bike table and Boat table. Each of these tables have the attributes Id, Make, Model and Year. Now I want to add a Service table with the attributes IdService and ServiceNote to the database where I want each IdService to refere to a specific car, bike or boat from one of the three tables.
How can I do this? Please see the simple diagram I have attached for better understanding.

You just include the columns you want. Here is one method:
create table services (
servicesid int identity(1, 1) primary key,
bikeid int references bikes(bikeid),
boatid int references boats(boatid),
carid int references cars(carid),
servicenote nvarchar(max),
check ((bikeid is not null and boatid is null and carid is null) or
(bikeid is null and boatid is not null and carid is null) or
(bikeid is null and boatid is null and carid is not null)
)
);
This structure allows you to properly declare the foreign key relationships. The check constraint also ensures that each row only talks to one entity.
Some databases support inheritance in tables. That would allow you to declare a "super" type for bikes, boats, and cars, all sharing the same id. Although you can do that without inheritance, it is a bit cumbersome. For a handful of connections, explicit references are often sufficient (although wasteful in the sense that the empty values do take up space on the data pages).

Related

Problems on having a field that will be null very often on a table in SQL Server

I have a column that sometimes will be null. This column is also a foreign key, so I want to know if I'll have problems with performance or with data consistency if this column will have weight
I know its a foolish question but I want to be sure.
There is no problem necessarily with this, other than it is likely indication that you might have poorly normalized design. There might be performance implications due to the way indexes are structured and the sparseness of the column with nulls, but without knowing your structure or intended querying scenarios any conclusions one might draw would be pure speculation.
A better solution might be a shared primary key where table A has a primary key, and there is zero or one records in B with the same primary key.
If table A can have one or zero B, but more than one A can refer to B, then what you have is a one to many relationship. This can be represented as Pieter laid out in his answer. This allows multiple A records to refer to the same B, and in turn each B may optionally refer to an A.
So you see there are two optional structures to address this problem, and choosing each is not guesswork. There is a distinct rational between why you would choose one or the other, but it depends on the nature of your relationships you are modelling.
Instead of this design:
create table Master (
ID int identity not null primary key,
DetailID int null references Detail(ID)
)
go
create table Detail (
ID int identity not null primary key
)
go
consider this instead
create table Master (
ID int identity not null primary key
)
go
create table Detail (
ID int identity not null primary key,
MasterID int not null references Master(ID)
)
go
Now the Foreign Key is never null, rather the existence (or not) of the Detail record indicates whether it exists.
If a Detail can exist for multiple records, create a mapping table to manage the relationship.

What is the necessity of junction tables?

I have implemented the following ways of storing relational topology:
1.A general junction relation table:
Table: Relation
Columns: id parent_type parent_id parent_prop child_type child_id child_prop
On which joins are not generally capable of being executed against by most sql engines.
2.Relation specific junction tables
Table: Class2Student
Columns: id parent_id parent_prop child_id child_prop
On which joins are capable of being executed against.
3.Storing lists/string maps of related objects in a text field on both bidirectional objects.
Class: Class
Class properties: id name students
Table columns: id name students_keys
Rows: 1 "history" [{type:Basic_student,id:1},{type:Advanced_student,id:3}]
To enable joins by the sql engines, it would be possible to write a custom module which would be made even easier if the contents of students_keys was simply [1,3], ie that a relation was to the explicit Student type.
The questions are the following in the context of:
I fail to see what the point of a junction table is. For example, I fail to see that any problems the following arguments for a junction table claim to relieve, actually exist:
Inability to logically correctly save a bidirectional relations (eg
there is no data orphaning in bidirectional relations or any
relations with a keys field, because one recursively saves and one can enforce
other operations (delete,update) quite easily)
Inability to join effectively
I am not soliciting opinions on your personal opinions on best practices or any cult-like statements on normalization.
The explicit question(s) are the following:
What are the instances where one would want to query a junction table that is not provided by querying a owning object's keys field?
What are logical implementation problems in the context of computation provided by the sql engine where the junction table is preferable?
The only implementation difference with regards to a junction table vs a keys fields is the following:
When searching for a query of the following nature you would need to match against the keys field with either a custom indexing implementation or some other reasonable implementation:
class_dao.search({students:advanced_student_3,name:"history"});
search for Classes that have a particular student and name "history"
As opposed to searching the indexed columns of the junction table and then selecting the approriate Classes.
I have been unable to identify answers why a junction table is logically preferable for quite literally any reason. I am not claiming this is the case or do I have a religious preference one way or another as evidenced by the fact that I implemented multiple ways of achieving this. My problem is I do not know what they are.
The way I see it, you have have several entities
CREATE TABLE StudentType
(
Id Int PRIMARY KEY,
Name NVarChar(50)
);
INSERT StudentType VALUES
(
(1, 'Basic'),
(2, 'Advanced'),
(3, 'SomeOtherCategory')
);
CREATE TABLE Student
(
Id Int PRIMARY KEY,
Name NVarChar(200),
OtherAttributeCommonToAllStudents Int,
Type Int,
CONSTRAINT FK_Student_StudentType
FOREIGN KEY (Type) REFERENCES StudentType(Id)
)
CREATE TABLE StudentAdvanced
(
Id Int PRIMARY KEY,
AdvancedOnlyAttribute Int,
CONSTRIANT FK_StudentAdvanced_Student
FOREIGN KEY (Id) REFERENCES Student(Id)
)
CREATE TABLE StudentSomeOtherCategory
(
Id Int PRIMARY KEY,
SomeOtherCategoryOnlyAttribute Int,
CONSTRIANT FK_StudentSomeOtherCategory_Student
FOREIGN KEY (Id) REFERENCES Student(Id)
)
Any attributes that are common to all students have columns on the Student table.
Types of student that have extra attributes are added to the StudentType table.
Each extra student type gets a Student<TypeName> table to store its specific attributes. These tables have an optional one-to-one relationship with Student.
I think that your "straw-man" junction table is a partial implementation of an EAV anti-pattern, the only time this is sensible, is when you can't know what attributes you need to model, i.e. your data will be entirely unstructured. When this is a real requirment, relational databases start to look less desirable. On those occasions consider a NOSQL/Document database alternative.
A junction table would be useful in the following scenario.
Say we add a Class entity to the model.
CREATE TABLE Class
(
Id Int PRIMARY KEY,
...
)
Its concievable that we would like to store the many-to-many realtionship between students and classes.
CREATE TABLE Registration
(
Id Int PRIMARY KEY,
StudentId Int,
ClassId Int,
CONSTRAINT FK_Registration_Student
FOREIGN KEY (StudentId) REFERENCES Student(Id),
CONSTRAINT FK_Registration_Class
FOREIGN KEY (ClassId) REFERENCES Class(Id)
)
This entity would be the right place to store attributes that relate specifically to a student's registration to a class, perhaps a completion flag for instance. Other data would naturally relate to this junction, pehaps a class specific attendance record or a grade history.
If you don't relate Class and Student in this way, how would you select both, all the students in a class, and all the classes a student reads. Performance wise, this is easily optimised by indices on key columns.
When a many-to-many realtionships exists without any attributes I agree that logically, the junction table needn't exist. However, in a relational database, junction tables are still a useful physical implmentaion, perhaps like this,
CREATE TABLE StudentClass
(
StudentId Int,
ClassId Int,
CONSTRAINT PK_StudentClass PRIMARY KEY (ClassId, StudentId),
CONSTRAINT FK_Registration_Student
FOREIGN KEY (StudentId) REFERENCES Student(Id),
CONSTRAINT FK_Registration_Class
FOREIGN KEY (ClassId) REFERENCES Class(Id)
)
this allows simple queries like
// students in a class?
SELECT StudentId
FROM StudentClass
WHERE ClassId = #classId
// classes read by a student?
SELECT ClassId
FROM StudentClass
WHERE StudentId = #studentId
additionaly, this enables a simple way to manage the relationship, partially or completely from either aspect, that will be familar to relational database developers and sargeable by query optimisers.

SQL - NULL foreign key

Please have a look at the database design below:
create table Person (id int identity, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
Currently all persons have an invoiceID i.e. the InvoiceID is not null.
I want to extend the database so that some person does not have an Invoice. The original developer hated nulls and never uses them. I want to be consistent so I am wondering if there are other patterns I can use to extend the database to meet this requirement. How can this be approached without using nulls?
Please note that the two tables above are for illustration purposes. They are not the actual tables.
NULL is a very important feature in databases and programming in general. It is significantly different from being zero or any other value. It is most commonly used to signify absence of value (though it also can mean unknown value, but that's less used as the interpretation). If some people do not have an invoice, then you should truly allow NULL, as that matches your desired Schema
A common pattern would be to store that association in a separate table.
Person: Id
Invoice: Id
Assoc: person_id, assoc_id
Then if a person doesn't have an invoice, you simply don't have a row. This approach also allows a person to have more than one invoice id which might make sense.
The only way to represent the optional relationship while avoiding nulls is to use another table, as some other answers have suggested. Then the absence of a row for a given Person indicates the person has no Invoice. You can enforce a 1:1 relationship between this table and the Person table by making person_id be the primary or unique key:
CREATE TABLE PersonInvoice (
person_id INT NOT NULL PRIMARY KEY,
invoice_id INT NOT NULL,
FOREIGN KEY (person_id) REFERENCES Person(id),
FOREIGN KEY (invoice_id) REFERENCES Invoice(id)
);
If you want to permit each person to have multiple invoices, you can declare the primary key as the pair of columns instead.
But this solution is to meet your requirement to avoid NULL. This is an artificial requirement. NULL has a legitimate place in a data model.
Some relational database theorists like Chris Date eschew NULL, explaining that the existence of NULL leads to some troubling logical anomalies in relational logic. For this camp, the absence of a row as shown above is a better way to represent missing data.
But other theorists, including E. F. Codd who wrote the seminal paper on relational theory, acknowledged the importance of a placeholder that means either "not known" or "not applicable." Codd even proposed in a 1990 book that SQL needed two placeholders, one for "missing but applicable" (i.e. unknown), and the other for "missing but inapplicable."
To me, the anomalies we see when using NULL in certain ways are like the undefined results we see in arithmetic when we divide by zero. The solution is: don't do that.
But certainly we shouldn't use any non-NULL value like 0 or '' (empty string) to represent missing data. And likewise we shouldn't use a NULL as if it were an ordinary scalar value.
I wrote more about NULL in a chapter titled "Fear of the Unknown" in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
You need to move the invoice/person relation to another table.
You end up with
create table Person (id int person_identity)
create table PersonInvoice (id int person_id, InvoiceID int not null)
create table Invoice (id int identity, date datetime)
You need this for some databases to allow in InvoiceId to be a foreign key as some do not allow NULLS in a foreign key.
If a person only can have one invoice then PersonInvoice can have a unique constraint on the person_id as well as the two columns together. You can also enforce having a single person for a invoice by adding a unique constraint to the invoiceID field.

Generic Database table design

Just trying to figure out the best way to design my table for the following scenario:
I have several areas in my system (documents, projects, groups and clients) and each of these can have comments logged against them.
My question is should I have one table like this:
CommentID
DocumentID
ProjectID
GroupID
ClientID
etc
Where only one of the ids will have data and the rest will be NULL or should I have a separate CommentType table and have my comments table like this:
CommentID
CommentTypeID
ResourceID (this being the id of the project/doc/client)
etc
My thoughts are that option 2 would be more efficient from an indexing point of view. Is this correct?
Option 2 is not a good solution for a relational database. It's called polymorphic associations (as mentioned by #Daniel Vassallo) and it breaks the fundamental definition of a relation.
For example, suppose you have a ResourceId of 1234 on two different rows. Do these represent the same resource? It depends on whether the CommentTypeId is the same on these two rows. This violates the concept of a type in a relation. See SQL and Relational Theory by C. J. Date for more details.
Another clue that it's a broken design is that you can't declare a foreign key constraint for ResourceId, because it could point to any of several tables. If you try to enforce referential integrity using triggers or something, you find yourself rewriting the trigger every time you add a new type of commentable resource.
I would solve this with the solution that #mdma briefly mentions (but then ignores):
CREATE TABLE Commentable (
ResourceId INT NOT NULL IDENTITY,
ResourceType INT NOT NULL,
PRIMARY KEY (ResourceId, ResourceType)
);
CREATE TABLE Documents (
ResourceId INT NOT NULL,
ResourceType INT NOT NULL CHECK (ResourceType = 1),
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
CREATE TABLE Projects (
ResourceId INT NOT NULL,
ResourceType INT NOT NULL CHECK (ResourceType = 2),
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
Now each resource type has its own table, but the serial primary key is allocated uniquely by Commentable. A given primary key value can be used only by one resource type.
CREATE TABLE Comments (
CommentId INT IDENTITY PRIMARY KEY,
ResourceId INT NOT NULL,
ResourceType INT NOT NULL,
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
Now Comments reference Commentable resources, with referential integrity enforced. A given comment can reference only one resource type. There's no possibility of anomalies or conflicting resource ids.
I cover more about polymorphic associations in my presentation Practical Object-Oriented Models in SQL and my book SQL Antipatterns.
Read up on database normalization.
Nulls in the way you describe would be a big indication that the database isn't designed properly.
You need to split up all your tables so that the data held in them is fully normalized, this will save you a lot of time further down the line guaranteed, and it's a lot better practice to get into the habit of.
From a foreign key perspective, the first example is better because you can have multiple foreign key constraints on a column but the data has to exist in all those references. It's also more flexible if the business rules change.
To continue from #OMG Ponies' answer, what you describe in the second example is called a Polymorphic Association, where the foreign key ResourceID may reference rows in more than one table. However in SQL databases, a foreign key constraint can only reference exactly one table. The database cannot enforce the foreign key according to the value in CommentTypeID.
You may be interested in checking out the following Stack Overflow post for one solution to tackle this problem:
MySQL - Conditional Foreign Key Constraints
The first approach is not great, since it is quite denormalized. Each time you add a new entity type, you need to update the table. You may be better off making this an attribute of document - I.e. store the comment inline in the document table.
For the ResourceID approach to work with referential integrity, you will need to have a Resource table, and a ResourceID foreign key in all of your Document, Project etc.. entities (or use a mapping table.) Making "ResourceID" a jack-of-all-trades, that can be a documentID, projectID etc.. is not a good solution since it cannot be used for sensible indexing or foreign key constraint.
To normalize, you need to the comment table into one table per resource type.
Comment
-------
CommentID
CommentText
...etc
DocumentComment
---------------
DocumentID
CommentID
ProjectComment
--------------
ProjectID
CommentID
If only one comment is allowed, then you add a unique constraint on the foreign key for the entity (DocumentID, ProjectID etc.) This ensures that there can only be one row for the given item and so only one comment. You can also ensure that comments are not shared by using a unique constraint on CommentID.
EDIT: Interestingly, this is almost parallel to the normalized implementation of ResourceID - replace "Comment" in the table name, with "Resource" and change "CommentID" to "ResourceID" and you have the structure needed to associate a ResourceID with each resource. You can then use a single table "ResourceComment".
If there are going to be other entities that are associated with any type of resource (e.g. audit details, access rights, etc..), then using the resource mapping tables is the way to go, since it will allow you to add normalized comments and any other resource related entities.
I wouldn't go with either of those solutions. Depending on some of the specifics of your requirements you could go with a super-type table:
CREATE TABLE Commentable_Items (
commentable_item_id INT NOT NULL,
CONSTRAINT PK_Commentable_Items PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Projects (
commentable_item_id INT NOT NULL,
... (other project columns)
CONSTRAINT PK_Projects PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Documents (
commentable_item_id INT NOT NULL,
... (other document columns)
CONSTRAINT PK_Documents PRIMARY KEY CLUSTERED (commentable_item_id))
GO
If the each item can only have one comment and comments are not shared (i.e. a comment can only belong to one entity) then you could just put the comments in the Commentable_Items table. Otherwise you could link the comments off of that table with a foreign key.
I don't like this approach very much in your specific case though, because "having comments" isn't enough to put items together like that in my mind.
I would probably go with separate Comments tables (assuming that you can have multiple comments per item - otherwise just put them in your base tables). If a comment can be shared between multiple entity types (i.e., a document and a project can share the same comment) then have a central Comments table and multiple entity-comment relationship tables:
CREATE TABLE Comments (
comment_id INT NOT NULL,
comment_text NVARCHAR(MAX) NOT NULL,
CONSTRAINT PK_Comments PRIMARY KEY CLUSTERED (comment_id))
GO
CREATE TABLE Document_Comments (
document_id INT NOT NULL,
comment_id INT NOT NULL,
CONSTRAINT PK_Document_Comments PRIMARY KEY CLUSTERED (document_id, comment_id))
GO
CREATE TABLE Project_Comments (
project_id INT NOT NULL,
comment_id INT NOT NULL,
CONSTRAINT PK_Project_Comments PRIMARY KEY CLUSTERED (project_id, comment_id))
GO
If you want to constrain comments to a single document (for example) then you could add a unique index (or change the primary key) on the comment_id within that linking table.
It's all of these "little" decisions that will affect the specific PKs and FKs. I like this approach because each table is clear on what it is. In databases that's usually better then having "generic" tables/solutions.
Of the options you give, I would go for number 2.
Option 2 is a good way to go. The issue that I see with that is you are putting the resouce key on that table. Each of the IDs from the different resources could be duplicated. When you join resources to the comments you will more than likely come up with comments that do not belong to that particular resouce. This would be considered a many to many join. I would think a better option would be to have your resource tables, the comments table, and then tables that cross reference the resource type and the comments table.
If you carry the same sort of data about all comments regardless of what they are comments about, I'd vote against creating multiple comment tables. Maybe a comment is just "thing it's about" and text, but if you don't have other data now, it's likely you will: date the comment was entered, user id of person who made it, etc. With multiple tables, you have to repeat all these column definitions for each table.
As noted, using a single reference field means that you could not put a foreign key constraint on it. This is too bad, but it doesn't break anything, it just means you have to do the validation with a trigger or in code. More seriously, joins get difficult. You can just say "from comment join document using (documentid)". You need a complex join based on the value of the type field.
So while the multiple pointer fields is ugly, I tend to think that's the right way to go. I know some db people say there should never be a null field in a table, that you should always break it off into another table to prevent that from happening, but I fail to see any real advantage to following this rule.
Personally I'd be open to hearing further discussion on pros and cons.
Pawnshop Application:
I have separate tables for Loan, Purchase, Inventory & Sales transactions.
Each tables rows are joined to their respective customer rows by:
customer.pk [serial] = loan.fk [integer];
= purchase.fk [integer];
= inventory.fk [integer];
= sale.fk [integer];
I have consolidated the four tables into one table called "transaction", where a column:
transaction.trx_type char(1) {L=Loan, P=Purchase, I=Inventory, S=Sale}
Scenario:
A customer initially pawns merchandise, makes a couple of interest payments, then decides he wants to sell the merchandise to the pawnshop, who then places merchandise in Inventory and eventually sells it to another customer.
I designed a generic transaction table where for example:
transaction.main_amount DECIMAL(7,2)
in a loan transaction holds the pawn amount,
in a purchase holds the purchase price,
in inventory and sale holds sale price.
This is clearly a denormalized design, but has made programming alot easier and improved performance. Any type of transaction can now be performed from within one screen, without the need to change to different tables.

Many to many table design question

Originally I had two tables in my DB, [Property] and [Employee].
Each employee can have one "Home Property" so the employee table has a HomePropertyID FK field to Property.
Later I needed to model the situation where despite having only one "Home Property" the employee did work at or cover for multiple properties.
So I created an [Employee2Property] table that has EmployeeID and PropertyID FK fields to model this many-to-many relationship.
Now I find that I need to create other many-to-many relationships between employees and properties. For example if there are multiple employees that are managers for a property or multiple employees that perform maintenance work at a property, etc.
My questions are:
Should I create separate many-to-many tables for each of these situations or should I just create one more table like [PropertyAssociatonType] that lists the types of associations an employee can have with a property and just add a FK field to [Employee2Property] such as PropertyAssociationTypeID that explains what the association is? I'm curious about the pros/cons or if there's another better way.
Am I stupid and going about this all wrong?
Thanks for any suggestions :)
This is a very valid question. And the answer is: it depends
The following things suggest using a single 'typed' M:N relationship:
you often want to process all employee-property relationships, independent of type
the number of associations is changing all the time, i.e. new types get invented.
a employee property relationship sometimes changes its type.
If these statements are more wrong then right, you might better be of using separate relationships.
Create Table Employee
(
Id int not null Primary Key
, ....
)
Create Table Property
(
Id int not null Primary Key
, ....
)
Create Table Role
(
Name varchar(10) not null Primary Key
, ....
)
Into the roles table you would put things like "Manager" etc.
Create Table PropertyEmployeeRoles
(
PropertyId int not null
, EmployeeId int not null
, RoleName varchar(10) not null
, Constraint FK_PropertyEmployeeRoles_Properties
Foreign Key( PropertyId )
References dbo.Properties( Id )
, Constraint FK_PropertyEmployeeRoles_Employees
Foreign Key( EmployeeId )
References dbo.Employees( Id )
, Constraint FK_PropertyEmployeeRoles_Roles
Foreign Key( RoleName )
References dbo.Roles( Name )
, Constraint UK_PropertyEmployeeRoles Unique ( PropertyId, EmployeeId, RoleName )
)
In this way, the same employee could serve multiple roles on the same property. This structure would not work for situations where you needed to guarantee that there was one and only of some item (e.g. HomeProperty) but would allow you to expand the list of roles that an employee could have with respect to a given property.
Two considerations should guide your choice.
How static will the list of potential types of associations be? You are already having to add new ones, so it seems that the answer here may be - not very. If there is likelihood that this list will grow, stick with choice B (one table with an additonal FK to an AssociationType table)
What are the access patterns for this data likely to be? If there will be a need to access the individual types of associations, isolated from all the other types, then multiple tables might be better. But I would only do this is the list of associatipon types was also very static