Generic Database table design - sql

Just trying to figure out the best way to design my table for the following scenario:
I have several areas in my system (documents, projects, groups and clients) and each of these can have comments logged against them.
My question is should I have one table like this:
CommentID
DocumentID
ProjectID
GroupID
ClientID
etc
Where only one of the ids will have data and the rest will be NULL or should I have a separate CommentType table and have my comments table like this:
CommentID
CommentTypeID
ResourceID (this being the id of the project/doc/client)
etc
My thoughts are that option 2 would be more efficient from an indexing point of view. Is this correct?

Option 2 is not a good solution for a relational database. It's called polymorphic associations (as mentioned by #Daniel Vassallo) and it breaks the fundamental definition of a relation.
For example, suppose you have a ResourceId of 1234 on two different rows. Do these represent the same resource? It depends on whether the CommentTypeId is the same on these two rows. This violates the concept of a type in a relation. See SQL and Relational Theory by C. J. Date for more details.
Another clue that it's a broken design is that you can't declare a foreign key constraint for ResourceId, because it could point to any of several tables. If you try to enforce referential integrity using triggers or something, you find yourself rewriting the trigger every time you add a new type of commentable resource.
I would solve this with the solution that #mdma briefly mentions (but then ignores):
CREATE TABLE Commentable (
ResourceId INT NOT NULL IDENTITY,
ResourceType INT NOT NULL,
PRIMARY KEY (ResourceId, ResourceType)
);
CREATE TABLE Documents (
ResourceId INT NOT NULL,
ResourceType INT NOT NULL CHECK (ResourceType = 1),
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
CREATE TABLE Projects (
ResourceId INT NOT NULL,
ResourceType INT NOT NULL CHECK (ResourceType = 2),
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
Now each resource type has its own table, but the serial primary key is allocated uniquely by Commentable. A given primary key value can be used only by one resource type.
CREATE TABLE Comments (
CommentId INT IDENTITY PRIMARY KEY,
ResourceId INT NOT NULL,
ResourceType INT NOT NULL,
FOREIGN KEY (ResourceId, ResourceType) REFERENCES Commentable
);
Now Comments reference Commentable resources, with referential integrity enforced. A given comment can reference only one resource type. There's no possibility of anomalies or conflicting resource ids.
I cover more about polymorphic associations in my presentation Practical Object-Oriented Models in SQL and my book SQL Antipatterns.

Read up on database normalization.
Nulls in the way you describe would be a big indication that the database isn't designed properly.
You need to split up all your tables so that the data held in them is fully normalized, this will save you a lot of time further down the line guaranteed, and it's a lot better practice to get into the habit of.

From a foreign key perspective, the first example is better because you can have multiple foreign key constraints on a column but the data has to exist in all those references. It's also more flexible if the business rules change.

To continue from #OMG Ponies' answer, what you describe in the second example is called a Polymorphic Association, where the foreign key ResourceID may reference rows in more than one table. However in SQL databases, a foreign key constraint can only reference exactly one table. The database cannot enforce the foreign key according to the value in CommentTypeID.
You may be interested in checking out the following Stack Overflow post for one solution to tackle this problem:
MySQL - Conditional Foreign Key Constraints

The first approach is not great, since it is quite denormalized. Each time you add a new entity type, you need to update the table. You may be better off making this an attribute of document - I.e. store the comment inline in the document table.
For the ResourceID approach to work with referential integrity, you will need to have a Resource table, and a ResourceID foreign key in all of your Document, Project etc.. entities (or use a mapping table.) Making "ResourceID" a jack-of-all-trades, that can be a documentID, projectID etc.. is not a good solution since it cannot be used for sensible indexing or foreign key constraint.
To normalize, you need to the comment table into one table per resource type.
Comment
-------
CommentID
CommentText
...etc
DocumentComment
---------------
DocumentID
CommentID
ProjectComment
--------------
ProjectID
CommentID
If only one comment is allowed, then you add a unique constraint on the foreign key for the entity (DocumentID, ProjectID etc.) This ensures that there can only be one row for the given item and so only one comment. You can also ensure that comments are not shared by using a unique constraint on CommentID.
EDIT: Interestingly, this is almost parallel to the normalized implementation of ResourceID - replace "Comment" in the table name, with "Resource" and change "CommentID" to "ResourceID" and you have the structure needed to associate a ResourceID with each resource. You can then use a single table "ResourceComment".
If there are going to be other entities that are associated with any type of resource (e.g. audit details, access rights, etc..), then using the resource mapping tables is the way to go, since it will allow you to add normalized comments and any other resource related entities.

I wouldn't go with either of those solutions. Depending on some of the specifics of your requirements you could go with a super-type table:
CREATE TABLE Commentable_Items (
commentable_item_id INT NOT NULL,
CONSTRAINT PK_Commentable_Items PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Projects (
commentable_item_id INT NOT NULL,
... (other project columns)
CONSTRAINT PK_Projects PRIMARY KEY CLUSTERED (commentable_item_id))
GO
CREATE TABLE Documents (
commentable_item_id INT NOT NULL,
... (other document columns)
CONSTRAINT PK_Documents PRIMARY KEY CLUSTERED (commentable_item_id))
GO
If the each item can only have one comment and comments are not shared (i.e. a comment can only belong to one entity) then you could just put the comments in the Commentable_Items table. Otherwise you could link the comments off of that table with a foreign key.
I don't like this approach very much in your specific case though, because "having comments" isn't enough to put items together like that in my mind.
I would probably go with separate Comments tables (assuming that you can have multiple comments per item - otherwise just put them in your base tables). If a comment can be shared between multiple entity types (i.e., a document and a project can share the same comment) then have a central Comments table and multiple entity-comment relationship tables:
CREATE TABLE Comments (
comment_id INT NOT NULL,
comment_text NVARCHAR(MAX) NOT NULL,
CONSTRAINT PK_Comments PRIMARY KEY CLUSTERED (comment_id))
GO
CREATE TABLE Document_Comments (
document_id INT NOT NULL,
comment_id INT NOT NULL,
CONSTRAINT PK_Document_Comments PRIMARY KEY CLUSTERED (document_id, comment_id))
GO
CREATE TABLE Project_Comments (
project_id INT NOT NULL,
comment_id INT NOT NULL,
CONSTRAINT PK_Project_Comments PRIMARY KEY CLUSTERED (project_id, comment_id))
GO
If you want to constrain comments to a single document (for example) then you could add a unique index (or change the primary key) on the comment_id within that linking table.
It's all of these "little" decisions that will affect the specific PKs and FKs. I like this approach because each table is clear on what it is. In databases that's usually better then having "generic" tables/solutions.

Of the options you give, I would go for number 2.

Option 2 is a good way to go. The issue that I see with that is you are putting the resouce key on that table. Each of the IDs from the different resources could be duplicated. When you join resources to the comments you will more than likely come up with comments that do not belong to that particular resouce. This would be considered a many to many join. I would think a better option would be to have your resource tables, the comments table, and then tables that cross reference the resource type and the comments table.

If you carry the same sort of data about all comments regardless of what they are comments about, I'd vote against creating multiple comment tables. Maybe a comment is just "thing it's about" and text, but if you don't have other data now, it's likely you will: date the comment was entered, user id of person who made it, etc. With multiple tables, you have to repeat all these column definitions for each table.
As noted, using a single reference field means that you could not put a foreign key constraint on it. This is too bad, but it doesn't break anything, it just means you have to do the validation with a trigger or in code. More seriously, joins get difficult. You can just say "from comment join document using (documentid)". You need a complex join based on the value of the type field.
So while the multiple pointer fields is ugly, I tend to think that's the right way to go. I know some db people say there should never be a null field in a table, that you should always break it off into another table to prevent that from happening, but I fail to see any real advantage to following this rule.
Personally I'd be open to hearing further discussion on pros and cons.

Pawnshop Application:
I have separate tables for Loan, Purchase, Inventory & Sales transactions.
Each tables rows are joined to their respective customer rows by:
customer.pk [serial] = loan.fk [integer];
= purchase.fk [integer];
= inventory.fk [integer];
= sale.fk [integer];
I have consolidated the four tables into one table called "transaction", where a column:
transaction.trx_type char(1) {L=Loan, P=Purchase, I=Inventory, S=Sale}
Scenario:
A customer initially pawns merchandise, makes a couple of interest payments, then decides he wants to sell the merchandise to the pawnshop, who then places merchandise in Inventory and eventually sells it to another customer.
I designed a generic transaction table where for example:
transaction.main_amount DECIMAL(7,2)
in a loan transaction holds the pawn amount,
in a purchase holds the purchase price,
in inventory and sale holds sale price.
This is clearly a denormalized design, but has made programming alot easier and improved performance. Any type of transaction can now be performed from within one screen, without the need to change to different tables.

Related

SQL Server use same Guid as primary key in 2 tables

We have 2 tables with a 1:1 relationship.
1 table should reference the other, typically one would use a FK relationship.
Since there is a 1:1 relationship, we could also directly use the same Guid in both tables as primary key.
Additional info: the data is split into 2 tables since the data is rather separate, think "person" and "address" - but in a world where there is a clear 1:1 relationship between the 2.
As per the tags I was suggested I assume this is called "shared primary key".
Would using the same Guid as PK in 2 tables have any ill effects?
To consolidate info from comments into answer...
No, there are no ill effects of two tables sharing PK.
You will still need to create a FK reference from 2nd table, FK column will be the same as PK column.
Though, your example of "Person" and "Address" in 1:1 situation is not best suited. Common usage of this practice is entities that extend one another. For example: Table "User" can hold common info on all users, but tables "Candidate" and "Recruiter" can each expand on it, and all tables can share same PK. Programming language representation would also be classes that extends one another.
Other (similar) example would be table that store more detailed info than the base table like "User" and "UserDetails". It's 1:1 and no need to introduce additional PK column.
Code sample where PK is also a FK:
CREATE TABLE [User]
(
id INT PRIMARY KEY
, name NVARCHAR(100)
);
CREATE TABLE [Candidate]
(
id INT PRIMARY KEY FOREIGN KEY REFERENCES [User](id)
, actively_looking BIT
);
CREATE TABLE [Recruiter]
(
id INT PRIMARY KEY
, currently_hiring BIT
, FOREIGN KEY (id) REFERENCES [User](id)
);
PS: As mentioned GUID is not best suited column for PK due to performance issues, but that's another topic.

Problems on having a field that will be null very often on a table in SQL Server

I have a column that sometimes will be null. This column is also a foreign key, so I want to know if I'll have problems with performance or with data consistency if this column will have weight
I know its a foolish question but I want to be sure.
There is no problem necessarily with this, other than it is likely indication that you might have poorly normalized design. There might be performance implications due to the way indexes are structured and the sparseness of the column with nulls, but without knowing your structure or intended querying scenarios any conclusions one might draw would be pure speculation.
A better solution might be a shared primary key where table A has a primary key, and there is zero or one records in B with the same primary key.
If table A can have one or zero B, but more than one A can refer to B, then what you have is a one to many relationship. This can be represented as Pieter laid out in his answer. This allows multiple A records to refer to the same B, and in turn each B may optionally refer to an A.
So you see there are two optional structures to address this problem, and choosing each is not guesswork. There is a distinct rational between why you would choose one or the other, but it depends on the nature of your relationships you are modelling.
Instead of this design:
create table Master (
ID int identity not null primary key,
DetailID int null references Detail(ID)
)
go
create table Detail (
ID int identity not null primary key
)
go
consider this instead
create table Master (
ID int identity not null primary key
)
go
create table Detail (
ID int identity not null primary key,
MasterID int not null references Master(ID)
)
go
Now the Foreign Key is never null, rather the existence (or not) of the Detail record indicates whether it exists.
If a Detail can exist for multiple records, create a mapping table to manage the relationship.

SQL sub-types with overlapping child tables

Consider the problem above where the 'CommonChild' entity can be a child of either sub-type A or B, but not C. How would I go about designing the physical model in a relational [SQL] database?
Ideally, the solution would allow...
for an identifying relationship between CommonChild and it's related sub-type.
a 1:N relationship.
Possible Solutions
Add an additional sub-type to the super-type and move sub-type A and B under the new sub-type. The CommonChild can then have a FK constraint on the newly created sub-type. Works for the above, but not if an additional entity is added which can have a relationship with sub-type A and C, but not B.
Add a FK constraint between the CommonChild and SuperType. Use a trigger or check constraint (w/ UDF) against the super-type's discriminator before allowing a new tuple into CommonChild. Seems straight forward, but now CommonChild almost seems like new subtype itself (which it is not).
My model is fundamentally flawed. Remodel and the problem should go away.
I'm looking for other possible solutions or confirmation of one of the above solutions I've already proposed.
Thanks!
EDIT
I'm going to implement the exclusive foreign key solution provided by Branko Dimitrijevic (see accepted answer).
I am going to make a slight modifications in this case as:
the super-type, sub-type, and "CommonChild" all have the same PKs and;
the PKs are 3 column composites.
The modification is to to create an intermediate table whose sole role is to enforce the exclusive FK constraint between the sub-types and the "CommonChild" (exact model provided by Dimitrijevic minus the "CommonChild's" attributes.). The CommonChild's PK will have a normal FK constraint to the intermediate table.
This will prevent the "CommonChild" from having 2 sets of 3 column composite FKs. Plus, since the identifying relationship is maintained from super-type to "CommonChild", [read] queries can effectively ignore the intermediate table altogether.
Looks like you need a variation of exclusive foreign keys:
CREATE TABLE CommonChild (
Id AS COALESCE(SubTypeAId, SubTypeBId) PERSISTED PRIMARY KEY,
SubTypeAId int REFERENCES SubTypeA (SuperId),
SubTypeBId int REFERENCES SubTypeB (SuperId),
Attr6 varchar,
CHECK (
(SubTypeAId IS NOT NULL AND SubTypeBId IS NULL)
OR (SubTypeAId IS NULL AND SubTypeBId IS NOT NULL)
)
);
There are couple of thing to note here:
There are two NULL-able FOREIGN KEYs.
There is a CHECK that allows exactly one of these FKs be non-NULL.
There is a computed column Id which equals one of the FKs (whichever is currently non-NULL) which is also a PRIMARY KEY. This ensures that:
One parent cannot have multiple children.
A "grandchild" table can reference the CommonChild.Id directly from its FK. The SuperType.Id is effectively popagated all the way down.
We don't have to mess with NULL-able UNIQUE constraints, which are problematic in MS SQL Server (see below).
A DBMS-agnostic way of of doing something similar would be...
CREATE TABLE CommonChild (
Id int PRIMARY KEY,
SubTypeAId int UNIQUE REFERENCES SubTypeA (SuperId),
SubTypeBId int UNIQUE REFERENCES SubTypeB (SuperId),
Attr6 varchar,
CHECK (
(SubTypeAId IS NOT NULL AND SubTypeAId = Id AND SubTypeBId IS NULL)
OR (SubTypeAId IS NULL AND SubTypeBId IS NOT NULL AND SubTypeBId = Id)
)
)
Unfortunately a UNIQUE column containing more than one NULL is not allowed by MS SQL Server, which is not the case in most DBMSes. However, you can just omit the UNIQUE constraint if you don't want to reference SubTypeAId or SubTypeBId directly.
Wondering what am I missing here?
Admittedly, it is hard without having the wording of the specific problem, but things do feel a bit upside-down.

SQL One-to-One Relationship Definition

I'm designing a database and I'm not sure how to define one of the relationships. Here's the situation:
An invoice is created
If the product is not in stock then it needs to be manufactured and so a work order is created.
The relationship is one-to-one. However work orders are sometimes created for other purposes so the WorkOrder table will also be linked to other tables in a similar one-to-one relationship. Also, some Invoices won't have a work order at all. This means I can't define these relationships in the normal way by using the same primary key in both tables. Instead of doing this I've created a linking table and then set unique indexes on both fields to define the one-to-one relationship (see image).
(source: markevans.org)
.
Is this the best way?
Cheers
Mark
EDIT: I just realised that this design will allow a single work order to be linked to an invoice and also to one of the other tables I mentioned via 2 linking tables. I guess no solution is perfect.
Okay, this answer is SQL Server specific, but should be adaptable to other RDBMSs, with a little work. So far as I see, we have the following constraints:
An invoice may be associated with 0 or 1 Work Orders
A Work Order must be associated with an invoice or an ABC or a DEF
I'd design the WorkOrder table as follows:
CREATE TABLE WorkOrder (
WorkOrderID int IDENTITY(1,1) not null,
/* Other Columns */
InvoiceID int null,
ABCID int null,
DEFID int null,
/* Etc for other possible links */
constraint PK_WorkOrder PRIMARY KEY (WorkOrderID),
constraint FK_WorkOrder_Invoices FOREIGN KEY (InvoiceID) references Invoice (InvoiceID),
constraint FK_WorkOrder_ABC FOREIGN KEY (ABCID) references ABC (ABCID),
/* Etc for other FKs */
constraint CK_WorkOrders_SingleFK CHECK (
CASE WHEN InvoiceID is null THEN 0 ELSE 1 END +
CASE WHEN ABCID is null THEN 0 ELSE 1 END +
CASE WHEN DEFID is null THEN 0 ELSE 1 END
/* + other FK columns */
= 1
)
)
So, basically, this table is constrained to only FK to one other table, no matter how many PKs are defined. If necessary, a computed column could tell you the "Type" of item that this is linked to, based on which FK column is non-null, or the type and a single int column could be real columns, and InvoiceID, ABCID, etc could be computed columns.
The final thing to ensure is that an invoice only has 0 or 1 Work Orders. If your RDMBS ignores nulls in unique constraints, this is as simple as applying such a constraint to each FK column. For SQL Server, you need to use a filtered index (>=2008) or an indexed view (<=2005). I'll just show the filtered index:
CREATE UNIQUE INDEX IX_WorkItems_UniqueInvoices on
WorkItem (InvoiceID) where (InvoiceID is not null)
Another way to deal with keeping WorkOrders straight is to include a WorkOrder type column in WorkOrder (e.g. 'Invoice','ABC','DEF'), including a computed or column constrained by check constraint to contain the matching value in the link table, and introduce a second foreign key:
CREATE TABLE WorkOrder (
WorkOrderID int IDENTITY(1,1) not null,
Type varchar(10) not null,
constraint PK_WorkOrder PRIMARY KEY (WorkOrderID),
constraint UQ_WorkOrder_TypeCheck UNIQUE (WorkOrderID,Type),
constraint CK_WorkOrder_Types CHECK (Type in ('INVOICE','ABC','DEF'))
)
CREATE TABLE Invoice_WorkOrder (
InvoiceID int not null,
WorkOrderID int not null,
Type varchar(10) not null default 'INVOICE',
constraint PK_Invoice_WorkOrder PRIMARY KEY (InvoiceID),
constraint UQ_Invoice_WorkOrder_OrderIDs UNIQUE (WorkOrderID),
constraint FK_Invoice_WorkOrder_Invoice FOREIGN KEY (InvoiceID) references Invoice (InvoiceID),
constraint FK_Invoice_WorkOrder_WorkOrder FOREIGN KEY (WorkOrderID) references WorkOrder (WorkOrderID),
constraint FK_Invoice_WorkOrder_TypeCheck FOREIGN KEY (WorkOrderID,Type) references WorkOrder (WorkOrderID,Type),
constraint CK_Invoice_WorkOrder_Type CHECK (Type = 'INVOICE')
)
The only disadvantage to this model, although closer to your original proposal, is that you can have a work order that isn't actually linked to any other item (although it claims to be for an e.g INVOICE).
What you have looks to be a perfectly normal way to construct your tables.
If you think you might like to use only one link table between your WorkOrder table and whatever other tables that may have WorkOrders, you could use a link table like:
WorkOrders
OtherId (Could be InvoiceId, or an ID for SomethingElse that may have a WorkOrder)
OtherType (ENUM - something like 'Invoice', 'SomethingElse')
WorkOrderId
So the issue is that you can have invoices that don't have work orders and work orders that don't have invoices but the two need to be linked when there is a link. I would say based upon that description that your database diagram is pretty good. This would open you up to allowing more than a one-to-one relationship. This way down the road you can consider having two work orders for one invoice. You might also have one work order that handles two invoices. This opens you up to a lot of possibilities that you may not need now but that you might in the future.
I would recommend your current design. In the future, you may want to add more information about the link between invoice and work order. This middle table will allow you to add this information.
In the interest of fairness to the other side of the coin, you do need to consider speed/number of tables/etc. that this will cause. For example, you have now created a third table which increased your table count by 50% in this example. Look at the rest of your database. If you did this everywhere, you would probably have the most normalized database but it might not be the most performant because of all the joins that are necessary. Basically, this isn't a "one-size-fits-all" solution. Instead it is a design choice. Personally, I hate nullable foreign key fields. I find they don't give me the granularity I usually want with my database designs.
Your schema corresponds to a many-to-many link between the 2 tables. You are de facto opening here the possibility to have one work order for multiple invoices, and multiple work orders for one invoice. The model offers then possibilities far above the rules you are setting.
You could use a simpler schema, that will reflect the (0,1) relation between work orders and invoices, and the (0,1) relation between Invoices and Work orders:
a Work Order can be independant from
an invoice, or linked to one specific
invoice: it has a (0,1) relation to Invoice table
An invoice can have no work orders, or one work orders: it has a (0,1) relation to Work Orders Table
Such a relation can be translated by the following model and rules
Invoice
id_Invoice, Primary Key
WorkOrder
id_WorkOrder, Primary Key
id_Invoice, Foreign Key, Nulls accepted, unique value
With such a structure, it will be easy to add new 'dependants' to work orders table. If, for example, you want to open the possibility to launch work orders from restocking orders (where you want to have minimal quantities of some items in stock), you can then just add the corresponding field to the WorkOrder table:
id_RestockingOrder, ForeignKey, Nulls accepted, unique value
You'll be then able to 'see' from where your WorkOrder comes: an invoice, a restocking order, etc.
Seems it corresponds to your needs.
Edit:
as noted by #mark, SQL Server will not allow multiple null values, in contradiction with ANSI specs (check here for some more details), As we do not want to wait for SQL Server 2011 to have this rule implemented, there is a workaround here, where you can build a view excluding the null values and set a unique index on this view. I must admit that I did not like this solution ...
There is still the possibility to implement the 'unique if not null' rule in your code. It will still be simpler than implementing the many-to-many model (with the Invoice_WorkOrder table) you are proposing and manage all additional unicity rules that you'll need to implement.
There is no real need for the link table, just have them linked directly and allow for NULL in the reference field of the work order. Because a work order can be linked to multiple tables what I would do is add a reference id on every work order to every table that can link from it. So you would have:
Invoice
PK - ID
FK - WorkOrderID
SomeOtherTable
PK - ID
FK - WorkOrderID
WorkOrder
PK - ID
FK - InvoiceID (allow NULL)
FK - SomeOtherTableID (allow NULL)
To make sure a WorkOrder is linked to only one item, you have to use code to validate the row (or perhaps a stored procedure which I cannot come up with right now).
EDIT: PS, if you want to use a link table, give it a generic name and add all the linked tables with the same sort of construct I just described allowing for NULL's. In my eyes adding the extra table makes the schema larger than it needs to be, but if a work order contains a lot of big text fields it could increase performance slightly and reduce database size with all the indexes flying around. In anything but the largest applications, I would consider it over-normalization though, but that is a matter of style.

T-SQL Tag Database Architecture Design?

Scenario
I am building a database that contains a series of different tables. These consist of a COMMENTS table, a BLOGS table & an ARTICLES table. I want to be able to add new items to each table, and tag them with between 0 and 5 tags to help the user search for particular information that is relevant more easily.
Initial thoughts for architecture
My first thoughts were to have a centralised table of TAGS. This table would list all of the available tags using a TagID field & a TagName field. Since each item can have many tags and each tag can have many items, I would need a MANY-TO-MANY relationship between each item table and the TAGS table.
For Example:
Many COMMENTS can have many TAGS.
Many TAGS can have many COMMENTS.
Many ARTICLES can have many TAGS.
Many TAGS can have many ARTICLES.
etc.....
Current Understanding
From previous experience I understand that a way of implementing this structure in T-SQL is to have an ajoining table between the COMMENTS table and the TAG table. This ajoining table would contain the CommentID & the TagID, as well as its own unique CommentTagID. This structure would also apply to all other items.
Questions
Firstly is this the right way to go about implementing such a database architecture? If not, what other methods would be feasible? Since the database will eventually contain a lot of information, I need to ensure that it is scalable. Is this a scalable implementation?
If I had lots of these tables would this architecture make CRUD operations very slow?
Should I use GUIDs or Incrementing INTs for the ID fields?
Help & suggestions would be appreciated greatly.
Thankyou.
You may also want to look at WordPress schema and database description to see how others are solving a similar problem.
Keeping a centralized table of tags is a good idea if you will ever need to do one of the following:
Build a complete list of all tags (that is mixing blog tags, comment tags and article tags)
Update the tags so that they get updated everywhere: so that when you change sqlserver to sql-server, it gets changed anywhere: in blogs, articles and comments.
Option 1 is very useful to build the tag clouds so I'd recommend to build a table of tags and reference it from your tables.
If you won't ever need to update the tags as described in the option 2, you don't ever need surrogate key for them.
You will most probably need a UNIQUE constraint on them anyway and there is no point not to make it a PRIMARY KEY, if you are not going to update them.
This will also save you lots of joins: you don't need to join with the tags table to show the tags.
GUIDs are more simple to manage, but theу make the indexes and link tables quite large in size.
You can assign a numerical identifier to each table and link like this:
tTag (tag VARCHAR(30) NOT NULL PRIMARY KEY)
tTaggable (type INT NOT NULL, id INT NOT NULL, PRIMARY KEY (type, id))
tTagLink (
tag VARCHAR(30) NOT NULL FOREIGN KEY REFERENCES tTag,
type INT NOT NULL, id INT NOT NULL,
PRIMARY KEY (tag, type, id),
FOREIGN KEY (type, id) REFERENCES tTaggable
)
tBlog (
id INT NOT NULL PRIMARY KEY,
type INT NOT NULL, CHECK(type = 1),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
tArticle (
id INT NOT NULL,
blog INT NOT NULL FOREIGN KEY REFERENCES tBlog,
type INT NOT NULL, CHECK(type = 2),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
tComment (
id INT NOT NULL PRIMARY KEY,
article INT NOT NULL FOREIGN KEY REFERENCES tArticle,
type INT NOT NULL, CHECK(type = 3),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
Note that if you want to delete a blog, an article or a comment, you should delete from tTaggable as well.
This way, tTaggable is only used to ensure the referential integrity. To query all tags for an article, you just issue this query:
SELECT tag
FROM tTagLink
WHERE type = 2
AND id = 1234567
, so you get all tags by querying a single table, without any joins.
usually many-to-many relationship implemented exactly as you describe it.
Auto-incrementing IDs it is good idea since it guarantee that they will be unique.
And you can use guids if you want to tag comments and articles with the same tag(instead of 6 tables you need just 5). But searching with guids may be more slow.